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Abstract. Shape analysis concerns the problem of determining "shape invariants" for pro- 
grams that perform destructive updating on dynamically allocated storage. In recent work, 
we have shown how shape analysis can be performed, using an abstract interpretation based 
on 3-valued first-order logic. In that work, concrete stores are finite 2-valued logical struc- 
tures, and the sets of stores that can possibly arise during execution are represented (conser- 
vatively) using a certain family of finite 3-valued logical structures. In this paper, we show 
how 3-valued structures that arise in shape analysis can be characterized using formulas in 
first-order logic with transitive closure. We also define a non-standard ("supervaluational") 
semantics for 3-valued first-order logic that is more precise than a conventional 3-valued 
semantics, and demonstrate that the supervaluational semantics can be effectively imple- 
mented using existing theorem provers. 

1 Introduction 

Abstraction and abstract interpretation [7] are key tools for automatically verifying 
properties of systems, both for hardware systems [5,8] and software systems [32]. In 
abstract interpretation, sets of concrete stores are represented in a conservative manner 
by abstract values (as explained below). Each transition of the system is given an inter- 
pretation over abstract values that is conservative with respect to its interpretation over 
corresponding sets of concrete stores; that is, the result of "executing" a transition must 
be an abstract value that describes a superset of the concrete stores that actually arise. 
This methodology guarantees that the results of abstract interpretation overapproximate 
the sets of concrete stores that actually arise at each point in the system. 

One issue that arises when abstraction is employed concerns the expressiveness of 
the abstraction method: "What collections of concrete states can be expressed exactly 
using the given abstraction method?" A second issue that arises when abstraction is 
employed is how to extract information from an abstract value. For instance, this is a 
fundamental problem for clients of abstract interpretation, such as verification tools, 
program optimizers, program-understanding tools, etc., which need to be able to inter- 
pret what an abstract value means. An abstract value a represents a set of concrete stores 
X; ideally, a query Lp should return an answer that summarizes the result of posing Lp 
against each concrete store 5 G X: 

- If is true for each S, the summary answer should be "true". 

- If (/? is false for each 5, the summary answer should be "false". 

- If (/3 is true for some S <E X but false for some S' G X, the summary answer should 
be "unknown". 

This paper presents results on both of these questions, for a class of abstractions that 
originally arose in work on the problem of shape analysis [21,4, 37]. Shape analysis 
concerns the problem of finding "shape descriptors" that characterize the shapes of 
the data structures that a program's pointer variables point to. Shape analysis is one 



of the most challenging problems in abstract interpretation because it generally deals 
with programs written in languages like C, C++, and Java, which allow (i) dynamic 
allocation and deallocation of cells from the heap, (ii) destructive updating of structure 
fields, and, in the case of Java, (iii) dynamic creation and destruction of threads. This 
combination of features creates considerable difficulties for any abstract-interpretation 
method. 

The motivation for the present paper was to understand the expressiveness of the 
shape abstractions defined in [37]. In that work, concrete stores are finite 2-valued log- 
ical structures, and the sets of stores that can possibly arise during execution are rep- 
resented (conservatively) using a certain family of finite 3-valued logical structures. In 
this setting, an abstract value is a set of 3-valued logical structures. Because the notion 
of abstraction used in [37] is based on logical structures, our results are actually much 
more broadly appUcable than shape-analysis problems. For example, in [40]) is applica- 
ble to accurately model concurrency in Java programs which contain dynamic creation 
of objects and threads. In fact our results apply to any abstraction in which concrete 
states of a system are represented by finite 2-value logical structure and abstraction is 
performed via the mechanisms described in Sections 2 and 3. Throughout the paper, 
however, we use shape-analysis examples to illustrate the concepts discussed. 

The paper investigates the expressiveness of finite 3-valued structures by giving a 
logical characterization of these structures; that is, we examine the question 

For a given 3-valued structure 5,under what circumstances is it possible to 
create a formula 7(5), such that S'^ satisfies ^{S) exactly when 5'' is a 2-valued 
structure that 5 represents? I.e., 5'' \='^{S) iff 5 represents 5''. 

This paper presents two results concerning this question: 

- It is not possible to give a formula 7(5*) written in first-order logic with transitive 
closure for an arbitrary structure S. However, it is always possible for a well-defined 
class of 3-valued structures. (This class includes all the 3-valued structures that have 
been shown to be useful for shape analysis [37].) 

- Moreover, it is always possible to give a 7(5) in general, using a more powerful 
formalism, namely, monadic second-order formulas. 

The abiUty to write a formula 7(5) that exactly captures what S represents provides 
a fundamental tool for improving TVLA [27] by the use of symbolic methods. The 
current TVLA system performs iterative fixed-point computations and yields at every 
program point a set of 3-valued structures, which represent a superset of all possible 
stores that can arise at this point in any execution. However, TVLA suffers from two 
Umitations: (i) it is not always as precise as possible (as explained below); (ii) it does not 
scale to handle large programs, because the worst-case complexity of the algorithm is 
doubly-exponential in certain parameters (typically, the number of program variables). 

The contributions of this paper lay the required groundwork for using symboUc 
techniques to address both of these limitations. The ability to characterize a 3-valued 
structure 5 by a formula j{S) is a key step toward harnessing a standard (2-valued) 
theorem prover to aid in abstract interpretation: 

- Computing the effect of a program statement on an abstract value in the most- 
precise way possible for a given shape-analysis abstraction. 
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- Developing a modular shape-analysis by using assume-guarantee reasoning. The 
idea is to allow arbitrary first-order formulas to be used to express pre- and post- 
conditions, thereby enabling the code of each procedure to be analyzed once for all 
potential contexts. This allows to scale shape analysis and to apply to appUcations 
in which not all the source code is available. This becomes specifically profitable 
for recursive procedures since it saves the need to iterate shape analysis. 

These methods are the subject of [42, 25]. 

Another contribution of this paper directly addresses the first of the aforementioned 
limitations of TVLA's current technique. We give a procedure for extracting information 
from a 3- valued logical structure S in the most-precise way possible. That is, we give a 
nonstandard way to check if a formula holds in S: 

- If 7(5*) is valid, i.e., holds in all 2-valued structures, we know that f evaluates 
to 1 in all the 2-valued structures represented by S. 

- If 7(6') => is vaUd, we know that evaluates to in all the 2-valued structures 
represented by S. 

- Otherwise we know that there exists a 2-valued structure represented by S where (f 
evaluates to 1, and there exists another 2-valued structure represented by S where 

evaluates to 0. 

This method represents the most-precise way of extracting information from a 3-valued 
logical structure; in particular, whenever this method returns 1/2 (standing for "un- 
known"), any sound method for extracting information from S must also return 1/2. 
This is in contrast with the techniques used in [37], which can return 1/2 even when all 
the 2-valued structures represented by S have the value 1 (or all have the value 0). 

Although the validity question is undecidable for first-order logic with transitive 
closure, several theorem provers for first-order logic have been created. We report on 
two experiments in which we used these tools to implement symboUc procedures for 
extracting information from a 3-valued structure in the most-precise way possible. Also, 
in [19], we have identified a decidable subset of first-order logic with transitive closure 
that is useful for shape analysis. We define conditions under which 7 can be expressed 
in that logic. 

The remainder of the paper is organized as follows. Section 2 defines our terminol- 
ogy, and explains the use of 3-valued structures as abstractions of 2-valued structures. 
Section 3 presents the results on the expressiveness of 3-valued structures, and gives 
an algorithm for generating 7 for certain famihes of 3-valued structures. Section 4 dis- 
cusses the problem of reading out information from a 3-valued structure in the most- 
precise way possible. Section 5 discusses the applications of 7 to program analysis and 
some implementation issues. Section 6 discusses related work. Appendix A defines an 
alternative abstract domain for shape analysis, based on canonical abstraction, and the 
7 operation for that domain. Appendix B shows how to characterize general 3-valued 
structures. Appendix C contains the details for one of the paper's examples. The proofs 
appear in Appendix D. 

2 Preliminaries 

Section 2.1 defines the syntax and standard Tarskian semantics of first-order logic with 
transitive closure and equaUty. Section 2.2 introduces integrity formulas, which exclude 
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structures that do not represent a potential store. Section 2.3 introduces 3- valued logical 
structures, which extend ordinary logical structures with an extra value, 1/2, which 
represents "unknown" values that arise when several concrete nodes are represented by 
a single abstract node. The powerset of 3- valued structures forms an abstract domain, 
which is related to the concrete domain consisting of the powerset of 2-valued structures 
via embedding, as described in Section 2.4. 

Fig. 1(a) shows the declaration of a Unked-Ust data type in C, and Fig. 1(b) shows a 
C program that searches a list and splices a new element into the Ust. This program will 
be used as a running example throughout this paper. 





/* insert. c */ 




♦include "list.h" 




void insert (List x, int d) { 




List y, t, e; 


/* list.h */ 

typedef struct node { 

struct node *n; 

int data; 
} *List; 


assert (acyclic_list (x) && x != NULL); 
y = x; 

while (y->n != NULL && ...) 

y = y->n; 
t = malloc ( ) ; 
t->data = d; 
e = y->n; 
t->n = e; 
y->n = t; 

} 


(a) 


(b) 



Fig. 1. (a) Declaration of a linked-list data type in C. (b) A C function that searches a list pointed 
to by parameter x, and splices in a new element. 



2.1 Syntax and Semantics of First-Order Formulas with Transitive Closure 

We represent concrete stores by ordinary 2-valued logical structures over a fixed finite 
set of predicate symbols V = {eq,p\, . . . ,Pn}, where eq is a designated binary predi- 
cate, denoting equality of nodes. We also use maxR to denote the maximal arity of the 
predicates in V. Without loss of generahty we exclude constant and function symbols 
from the logic 

Example 1. Table 1 hsts the set of predicates used in the running example. The unary 
predicates x, y, t, and e correspond to the program variables x, y, t, and e, respec- 
tively. The binary predicate n corresponds to the n fields of List elements. The unary 
predicate is ("is shared") captures "heap sharing", i.e.. List elements pointed to by 
more than one field. (It was introduced in [4] to capture Ust and tree data structures.) 
The unary predicates rx,ry,rt, and hold for heap nodes reachable from the program 
variables x, y, t, and e, respectively. A heap node u is said to be reachable from a 
program variable if the variable points to a heap node u', and it is possible to go from u' 
to u by following zero or more n-links. Reachability is defined in term of the reflexive 
transitive closure of the predicate n. 

* Constant symbols can be encoded via unary predicates, and n-ary functions via (n -|- l)-ary 
predicates. 
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The notion of reachability plays a crucial role in defining absttactions that are use- 
ful for proving program properties in practice. For instance, it may have the effect of 
preventing disjoint lists from being collapsed in the abstract representation. This may 
significantiy improve the precision of the answers obtained by a program analysis. 



Predicate 


Intended Meaning 


eq{vi,V2) 


Do «i and V2 denote the same heap node? 


q{v) 


Does pointer variable q point to node vl 


n{vi,V2) 


Does the n field of vi point to V2^ 




Is pointed to by more dian one field ? 


r,(v) 


Is llie node r reaeliable I'roni q ? 



Table 1. The set of predicates for representing the stores manipulated by programs that use the 
List data-tjfpe from Fig. 1(a). g denotes an arbitrary predicate in the set PVar, which contains a 
predicate for each program variable of type List. In the case of insert, PVar = {x, y, t, e}. 

We define first-order formulas inductively over the vocabulary V using the logical 
cormectives V and the quantifier 3, and the operator ' TC" in the standard way: 

::= I 1 I p{vi, ...,Vk) \ {^(fii) \ {ifii V (^2) | : (fii) \ {TC Vi,V2 : ipi){v3,V4) 

where p e "P; are variables; (p, ifi are formulas 

The set of free variables of a formula is defined as usual. A formula is closed when 
it has no free variables. The operator ' TC denotes transitive closure. If (fi is a formula 
with free variables V, then {TC Vi,V2 ■ ^i){vs,V4) is a formula with free variables 

{V -{VI,V2})U{V3,V4}. 

We use several shorthand notations: v'l ^ V2 = (""^i V (^2); (^i A 1^2 = ~'{~'Vi V 
""^2); ^ <P2 = (v^i =^ ^2) A {ip2 ^ and Vw : (/? = Sv : -k^. The 
transitive closure of a binary predicate j3 is p+ (113, V4) = {TC vi,V2 : p{vi,V2)){v3,Vi). 
The reflexive transitive closure of a binary predicate p is p*(w3, W4) = {{TC vi,V2 : 
p{vi,V2)){v3,V4)) V eq{v3, V4). The order of precedence among the connectives, from 
highest to lowest, is as follows: -1, A, V, ' TC\ V, and 3. We drop parentheses wherever 
possible, except for emphasis. 

Definition 1. (2-valued Logical Structures) Let Vi denote the set of predicate symbols 
with arity i. A logical structure over V is a pair S = {U,l) in which 

- U is a (possibly infinite) set of nodes. 

— i is the interpretation of predicate symbols, i.e., for every predicate symbol p € 
Vi, l{p) : ?7* {0, 1} determines the tuples for which p holds. Also, i{eq) is the 
interpretation of equality, i.e., b{eq){u\,U2) = lijfui= U2. 

Below we define the standard Tarskian semantics for first-order logic. 

Definition 2. (Semantics of First-Order Logical Formulas) Consider a logical struc- 
ture S = {U, l). An assignment Z is a function that maps free variables to nodes (i.e., 
an assignment has the functionality Z: {vi,V2, • • •} U). An assignment that is de- 
fined on all free variables of a formula (p is called complete /or ip. In the sequel, we 
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assume that every assignment Z that arises in connection with the discussion of some 
formula Lp is complete for (p. We say that S and Z satisfy a formula ip (denoted by 

S,Z \= ip) when one of the following holds: 

- ip = l 

- ip= p{vi, V2,..., Vi) and l{p){Z{vi), Z{v2), • • • , Z{vi)) = 1. 

- if = -iipQ and S, Z \= ipQ does not hold. 

- (fi = (fiW ip2, and either S, Z \= tpi or S, Z \= ip^- 

- Lp = : (fii and there exists a node u ^ U, m > 2, such that S, Z\v\ i-^- m] |= Lp\. 

- = (TC vi,V2 ■ 'Pi){vs,V4) and there exists u\,U2, ■ ■ ■ , Um G U, m > 2, such 
that Z{v3) = ui, Z{vi) = Um and for all 1 < i < m, S,Z[vi i— »• Ui,V2 i— > 

For a closed formula Lp, we will omit the assignment in the satisfaction relation, and 
merely write S \= ip. 

2.2 Integrity Formula 

Because not all logical structures represent stores, we use a designated closed formula 
F, called the integrity formula,^ to exclude structures that are not of interest; in our 
application, such structures are ones that do not correspond to possible stores. This 
allows us to restrict the set of structures to the ones satisfying F. 

Definition 3. A structure S is admissible ifS\=F. 

In the rest of the paper, we assume that we work with a fixed integrity formula F. 
All our notations are parameterized by V and F. 

Example 2. For the List data type, there are four conditions that define the admissible 
structures. At any time during execution, 

(a) each program variable can point to at most one heap node. 

(b) the n field of a heap node can point to at most one heap node. 

(c) predicate is ("is shared") holds for exactly those nodes that have two or more pre- 
decessors. 

(d) the reachabihty predicate for each variable q holds for exactly those nodes that are 
reachable from program variable q. 

The set PVar contains a predicate for each program variable of type List; in the 
case of insert, PVar = {x, y, t, e}. Thus, the integrity formula FList for the List 
data-type is: 

ApGPVarVwi, U2 : A ^(^2) eq{vi,V2) (a) 

A \/v,vi,V2 : n{v,vi) An{v,V2) ^ eq{vi,V2) (b) 

A : is{v) <^=^ 3vi,V2 ■ ^eq{vi,V2) An{vi,v) An{v2,v) (c) 
A AqePVar'^v : rg{v) <S=^ 3vi : q{vi) An*{vi,v) (d) 



' In [37] these are called "hygiene conditions". 
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2.3 3- Valued Logical Structures and Embedding 

In this section, we define 3- valued logical structures, which provide a way to represent 

a set of 2-valued logical structures in a compact and conservative way. 

We say that the values and 1 are definite values and that 1/2 is an indefinite value, 
and define a partial order □ on truth values to reflect information content. Zi C 
denotes that li possibly has more definite information than I2: 

Definition 4. [Information Order]. For li,l2 £ {0, 1/2, 1}, we define the informa- 
tion order on truth values as follows: li □ I2 ifh = h or I2 = 1/2. 

Definition 5. A 3-vaIued logical structure over V is the generalization of 2-valued 
structures given in Definition 1, in that predicates may have the value 1/2. This means 
that S = {U, l) where for p e Vt, t(p) : {U^Y {0, 1, 1/2}. In addition, (i) for all 
u € U^, L^{eq){u, u) □ 1, and(ii) for all ui,U2 G such that ui and U2 are distinct 
nodes, L^{eq){u\,U2) = 0. 

A node 11 E U having i^{eq){u. u) = 1/2 is called a summary node. As we shall 
see, such a node may represent more than one node from a given 2-valued structure. 

We denote the set of 2-valued logical structures by 2-STRUCT['P]. The set of 3- 
valued logical structures is denoted by 3-STRUCT['P]. 

A 3-valued structure can be depicted as a directed graph, with nodes as graph nodes. 
A unary predicate p is represented in the graph by having a solid arrow from the pred- 
icate name p to node u for each node u for which b{p) (u) = 1. An arrow between two 
nodes indicates whether a binary predicate holds for the corresponding pair of nodes. 
An indefinite value of a predicate is shown by a dotted arrow; the value 1 is shown by a 
soUd arrow; and the value is shown by the absence of an arrow. 

Example 3. Fig. 2(d) shows a 3-valued structure that represents possible inputs of the 
insert program. This structure represents all lists that are pointed to by program 
variable x and have at least two elements. The structure has 2 nodes, ui and w,2, where 
ui is the head of the list pointed to by x, and U2 is a summary node (drawn as a double 
circle), which represents the tail of the list. Predicate Tx holds for ui and U2, indicating 
that all elements of the list are reachable from x. Other unary predicates are not shown, 
indicating that their values are for all nodes, i.e., the program variables y, e, and t 
are NULL, and there is no sharing in the list. The dotted edge from ui to U2 indicates 
that there may be n-hnks from the head of the hst to some elements in the tail. In fact, 
the (ui, U2)-edge represents exactly one n-Unk that points to exactly one list element, 
because of conjunct (b) of the integrity formula Example 2. In contrast, the dotted self- 
loop on U2 represents all n-links that may occur in the tail. 

2.4 Embedding Order 

We define the embedding ordering on structures as follows: 

Definition 6. Let S = {U^ , t^) and S' = {U^ ) be two logical structures, and let 
f : be a surjective. We say that f embeds S in 5" (denoted by S S') if 

for every predicate symbol p GVi and allui,...,Ui e 

i^{p){uu ...,Ui)Q . . . , f{ui)) (1) 
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Sa [^-M^ 
h \ 

(a) 


iff 

^5 '^x Tx Tx 

(b) 


f t t t 
(c) 


. -7), . 

5 (mi) ■■■">■ Q) 
(d) 



Fig. 2. {a),(b),{c) Examples of 2-valued structures representing linked-lists that are pointed to by 
program variable x, of length 2, 3, and 4, respectively, (d) S represents all lists that are pointed 
to by program variable x and that have at least two elements, including the lists represented by 
(a)-(c). 

We say that S can be embedded in S' (denoted by S Q S') if there exists a function 
f such that S S'. 



Example 4. Fig. 2(a)-(c) show some of the 2-valued structures that can be embedded 
into the 3-valued structure S shown in Fig. 2(d). The function that embeds Sa into S 
maps the node € J/^" to Ui G U^, for i = 1,2. The function that embeds Sb into 
S maps the node u\ G U^'' to ui e , and both v\, u\ e U'^^ to U2 E . Also, 
Eq. (1) holds, because whenever a predicate has a definite value in S, the corresponding 
predicate in Sb has the same value. For example, i^{x){u2) is and f{u^) = f{u\) = 
U2, and both l^''{x){u\) and l^''{x){u\) are 0. Similarly, t^{rx){u2) = 1, and both 
i'^''{'''x){u2) and L^''{rx){u^) are 1. For a binary predicate, L^{n){u2,ui) = 0, and both 
L^'' (n) ; ""i ) and t^'' (n) (ug , ) are 0. 



Remark. Embedding can be viewed as a variant of homomorphism [13]. In cases where 
5 is a 2-valued structure (i.e., all predicates in S have definite values, including eq, 
which is interpreted as standard equality), checking whether a 2-vaIued structure S' 
embeds into S is equivalent to checking whether there is an isomorphism between 
S' and S. In cases where all nodes in S are summary nodes (i.e., for all u e U^, 
{eq) {u,u) = 1/ 2), and all other values of predicates are definite, embedding is equiv- 
alent to strong homomorphism. In cases where all nodes in S are summary nodes and 
all other values of predicates are either or 1/2, embedding is equivalent to homomor- 
phism. In all other cases, i.e, when a predicate value for some tuple in S is 1, embedding 
generalizes the notion of homomorphism. 

Remark. In Definition 6, we require that / be surjective in order to guarantee that a 

quantified formula, such as 3v : (p, has consistent values in two 3-valued structures S 
and S' related by embedding. For example, if / were not surjective, then there could 
exist an individual u' G , not in the range of /, such that the value of S' on is 1 
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when V is assigned to u'. This would permit there to be structures S and S' for which 
the value of 3t; : on 5 is but its value on S" is 1. 

Concretization of 3-Valued Structures. Embedding allows us to define the (poten- 
tially infinite) set of concrete structures that a set of S-valued structures represents: 

Definition 7. (Concretization of 3-Valued Structures) For a set of structures X C 
3-STRUCJ\P], we denote by ^{X) the set of2-valued structures that X represents, i.e., 

7(X) = {S^ e 2-STRUCJXP\ I exists S € Xsuch that Q S and \= F} (2) 

Also, for a singleton set X = {S} we write 7(5) instead of^{X). 

Example 5. Example 4 shows that Sa Q S, S}, Q S, and Sc Q S for the 2- valued 
structures in Figs. 2(a-c); also, the integrity formula is satisfied for Sa, Sb, and Sc. 
Therefore, Sa, Sb, and Sc are in the concretization of 3- valued structure S: Sa,Sb,Sc S 
7(5). Note that the indefinite values of predicates in S allow the corresponding values 
in Sb to be either or 1. In particular, t'^(eg)(u2, 1*2) = 1/2 refiects the fact that 
the abstract node U2 may represent more than one concrete node. Indeed, Sb contains 
two nodes, 1*2 and u\, that are represented by U2 G <S'. Also, i.^{eq){u2, u\) = 0, but 

L^{eq){ui4) = 'i- 

The abstract domain we consider is the powerset of 3-valued structures, where the or- 
dering relation □ is defined as follows: for every two sets of 3-valued structures Xi and 
X2, Xi □ X2 iff for all Si e Xi there exists 5*2 G X2 such that 5*1 is embedded into 
S2. 

The Analysis Technique The TVLA ([27]) system carries out an abstract interpreta- 
tion [7] to collect a set of structures at each program point P. This involves finding 
the least fixed point of a certain set of equations. To ensure termination, the analysis 
is carried out with respect to a finite abstract domain, that is, the set of different struc- 
tures is finite. When the fixed point is reached, the structures that have been collected 
at program point p describe a superset of all the concrete stores that can occur at p. To 
determine whether a query is always satisfied at p, one checks whether it holds in all 
of the structures that were collected there. Instantiations of this framework are capable 
of establishing nontrivial properties of programs that perform complex pointer-based 
manipulations of a priori unbounded- size heap-allocated data structures. 

3 Characterizing 3-Valued Structures by First-Order Formulas 

This section presents our results on characterizing 3-valued structures using first-order 
formulas. Given a 3-valued structure S, the question that we wish to answer is whether 
it is possible to give a formula j{S) that accepts exactly the set of 2-valued structures 
that S represents, i.e., S^ \= 7(5) iff S^ G 7(5). 

This question has different answers depending on what assumptions are made. The 
task of generating a characteristic formula for a 3-valued structure S is challenging 
because we have to find a formula that identifies when embedding is possible, i.e., that is 
satisfied by exactly those 2-valued structures that embed into 5. It is not always possible 
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to characterize an arbitrary 3- valued stracture by a first-order formula, i.e., there exists 
a 3-valued structure S for which there is no first-order formula with transitive closure 
that accepts exactly the set of 2- valued structures 7(8). 

For example, consider the 3-valued structure S shown in Fig. 3. The absence of a 
self loop on any of the three summary nodes implies that a 2-valued structure can be 
embedded into this structure if and only if it can be colored using 3 colors (Lemma Dl 
in the appendix). It is well-known that there exists no first-order formula, even with 
transitive closure, that expresses 3-colorability of undirected graphs, unless P = NP 
(e.g., see [18, 6]).^ Therefore, there is no first-order formula that accepts exactly the set 
7(5). 




Fig. 3. A 3-valued structure that represents 3-colorable undirected graphs. A 2-valued structure 
can be embedded into this structure if and only if it can be colored using 3 colors. 

3.1 FO-Identifiable Structures 

Intuitively, the difficulty in characterizing 3-valued structures is how to uniquely iden- 
tify the correspondence between concrete and abstract nodes using a first-order formula. 
Fortunately, as we will see, for the subclass of 3-valued structures used in shape analysis 
(also known as "bounded structures"), the correspondence can be easily defined using 
first-order formulas. The bounded structures are a subclass of the 3-valued structures in 
which it is possible to identify uniquely each node using a first-order formula. 

Definition 8. A 3-valued structure S is called FO-identifiable if for every node u G 
there exists a first-order formula node^ (w) with designated free variable w such 
that for every 2-valued structure S*^ that embeds into S using a function f, for every 
concrete node G U^' and for every node Ui G : 

f{u^) = Ui ^ S\ [w ^ u^] h nodeliw) (3) 

The idea behind this definition is to have a formula that uniquely identifies each node 
u of the 3-valued structure S. This will be used to identify the set of nodes of a 2- 
valued structure that are mapped to u by embedding. In other words, a concrete node 
satisfies the node formula of at most one abstract node, as formaUzed by the lemma: 

* In fact, the condition is even stronger. First-order logic with transitive closure can only ex- 
press non-deterministic logspace (NL) computations, thus, the NP-complete problem of 3- 
colorabiUty is not expressible in first-order logic, unless NL = NP. It is shown in [18] using 
an ordering relation on the nodes. In our context, without the ordering, the logic is less ex- 
pressive. Thus, the condition under which 3-colorability is expressible is even stronger than 
NL — NP. We believe that there is an example of a 3-valued structure that is not expressible 
in the logic, independently of the question whether P = NP. However, it is not the main 
focus of the current paper. 
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Lemma 1. Let S be an FO-identifiable structure, and let ui,U2 € S be distinct nodes. 
Let S'^ be a 2-valued structure that embeds into S and let vP' e S'^. At most one of the 
following hold: 

1. S'^^lw^u^] \=nodel^{w) 

2. S'^Xw^u^] \=nodel^{w) 

Remark. Definition 8 can be generalized to handle arbitrary 2-valued structures, by also 
allowing extra designated free variables for every concrete node and using equality to 
check if the concrete node is equal to the designated variable: nodef . {w, wi , . . . , w„) = 
w = Vi. However, the equality formula cannot be used to identify nodes in a 3- valued 
structure because equality evaluates to 1/2 on summary nodes. 

We now introduce a standard concept for turning valuations into formulas. 

Definition 9. For a predicate p ofarity k and truth value B G {0, 1, 1/2}, we define 
the formula p^ {vi,V2, ■■■ ,Vk) to be the cliaracteristic formula of B for p, by 

P°{vi,V2,...,Vk) = ^p{vi,V2,...,Vk) 
p'^{Vi,V2,-.-,Vk) = p{vi ,V2,...,Vk) 
P^/'^{vi,V2, . . . ,Ufc) = 1 

The main idea in the above definition is that, for B G {0, 1}, holds when the 
value of p is B, and for S = 1/2 the value of p is unrestricted. This is formalized by 
the following lemma: 

Lemma 2. For every 2-valued structure S'^ and assignment Z 

S\Z^p''{vu...,Vk) iffi^' {p){Z{vi), Z{vk)) E B 

Definition 8 is not a constructive definition, because the premises range over arbi- 
trary 2-valued structures and arbitrary embedding functions. For this reason, we now 
introduce a testable condition that implies FO-identifiability. 

Bounded Structures. The following subclass of 3-values structures was defined in 
[36];^ the motivation there was to guarantee that shape analysis was carried out with 
respect to a finite set of abstract structures, and hence that the analysis would always 
terminate. 

Definition 10. A bounded structure over vocabulary V is a structure S = {U^, l^) 
such that for every u-i,U2 G U^, where ui ^ U2, there exists a predicate symbol p gVi 
such that(i) L^{p){ui) ^ i.^{p){u2) and(ii) both t^{p){ui) and i.^{p){u2) are not 1/2. 

Intuitively, for each pair of nodes in a bounded structure, there is at least one predi- 
cate that has different definite values for these nodes. Thus, there is a finite number of 
different bounded structures (up to isomorphism). 

The following lemma shows that bounded structures are FO-identifiable using for- 
mulas over unary predicates only (denoted by Pi): 

' This definition of bounded structures was given in [36]; it is slightly more restrictive than the 
one given in [37, 26], which did not impose requirement lO(ii). However, it does not limit the 
set of problems handled by our method, if the structure that is bounded in the weak sense is 
also FO-identifiable. 
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Lemma 3. Every bounded 3-valued structure S is FO-identifiable , where 

nodel^w)'^ /\ (4) 
peVi 

Example 6. The first-order node formulas for the structure S shown in Fig. 2, are: 

nodef^ {w) = x{w) A rx{w) A -^y{w) A -^t{w) A ^e(w) 
A-iry(w) A -^ri{w) A -^re(w) A —>is(w) 

nodQ^^{w) = -^x{w) A rx{w) A -■y{w) A -^t{w) A -•e(w) 
A^ry{w) A -irt(w) A -ire(w) A -'is{w) 

Remark. In the case that 5 is a bounded 2-valued structure, the definition of a bounded 
structure becomes trivial. The reason is that every node in S can be named by a quantifier- 
free formula built from unary predicates. This is essentially the same as saying that ev- 
ery node can be named by a constant. If structure S' embeds into S, then S' must be 
isomorphic to S, therefore it is possible to name all nodes of S' by the same constants. 
However, this restricted case is not of particular interest for us, because, to guarantee 
termination, shape analysis operates on structures that contain summary nodes and in- 
definite values. In the case that 5* contains a summary node, a structure S' that embeds 
into S may have an unbounded number of nodes; hence the nodes of S' cannot be 
named by a finite set of constants in the language. 

We already know of interesting cases of FO-identifiable structures that are not 
bounded, which can be used to generahze the abstraction defined in [36]: 

Example 7. The 3-valued structure S' in Fig. 4 is FO-identifiable by: 

nodef^ (w) = x{w) A rx{w) A ->y{'w) A ->t{w) A ->e{w) 

A^ry{w) A -^rt{w) A -^r^iw) A -^is{w) 
node^^{w) = 3wi : x{wi) A n{wi,w) A -^x{w) A rx{w) A -^y{w) A -'t{w) A -•e(w) 

A-iry(w) A -'rt(w) A ^re{w) A ^is{w) 
node^^{w) = -•{Bwi : x{wi) A n{wi,w)) A -^x{w) A rx{'w) A ->y{w) A -'t{w) A ->e{w) 

A-iry(w) A -Ttiw) A -•re(w) A ^is{w) 

However, S" is not a bounded structure because nodes U2 and have the same values 
of unary predicates. To distinguish between these nodes, we extended nodef^ {w) with 
the underlined subformula, which captures the fact that only U2 is directly pointed to by 
an n-edge from m . 













t 






(S') 





Fig. 4. A 3-valued structure S' is FO-identifiable, but not bounded. 
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It can be shown that every FO-identifiable structure can be converted into a bounded 

structure by introducing more instrumentation predicates. For methodological reasons, 
we use the notion of FO-identifiable which directly capture the abihty to uniquely iden- 
tify embedding via (FO) formulas.^ One of the interesting features of FO-identifiable 
structures is that the structures generated by a common TVLA operation "focus", de- 
fined in [26], are all FO-identifiable (see Lemma D2 in Appendix D). For example, 
Fig. 4 shows the structure S', which is one of the structures resulting from apply- 
ing the"focus" operation to the structure 5* from Fig. 2(d) with the formula 3vi, : 
x(v\) A n{v\,V2). S' is FO-identifiable, but not bounded. However, structures like the 
one shown in Fig. 3 are not FO-identifiable unless P = NP. 

3.2 Characterizing FO-identifiable structures 

To characterize an FO-identifiable 3- valued structure, we must ensure 

1. the existence of a surjective embedding function. 

2. that every concrete node is represented by some abstract node. 

3. that corresponding concrete and abstract predicate values meet the embedding con- 
dition of Eq. (1). 

Definition 11. (First-order Characteristic Formula) Le/ 5* = {U = {ui, M2, . . . , m„}, t) 
be an FO-identifiable 3-valued structure. 

We define the totality characteristic formula to be the closed formula: 



For a predicate p of arity r > 1, we define the predicate characteristic formula to 

be the closed formula: 



n 




(5) 



We define the nuUary characteristic formula to be the closed formula: 




(6) 



i^\p\ = yWi,...,Wr : 



A 



ALinot/ef,(u;j) ^ p' 

•' J 



{u{,...,u'^}^U 
i.^ {p)(u{,...,u'^ 



Wr) 



(7) 



The characteristic formula of S is defined by: 



^' = NU{^v:nodel{v)) 




(8) 



In subsequent sections, we redefine this notion to capture other classes of structures. 
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The characteristic formula of set X C 3-STRUCT[P] is defined by: 

j{X) = FA{\/ e) (9) 
Sex 

Finally, for a singleton set X = {S} we write 7(5) instead of^{X). 

The main ideas behind the four conjuncts of Eq. (8) are: 

- The existential quantification in the first conjunct requires that the 2-valued struc- 
tures have at least n distinct nodes. For each abstract node in S, the first sub-formula 
locates the corresponding concrete node. Overall, this conjunct guarantees that em- 
bedding is surjective. 

- The totality formula ensures that every concrete node is represented by some ab- 
stract node. It guarantees that the embedding function is well-defined. 

- The nuUary characteristic formula ensures that the values of nuUary predicates in 
the 2-valued structures are at least as precise as the values of the corresponding 
nullary predicates in S. 

- The predicate characteristic formulas guarantee that predicate values in the 2-valued 
structures obey the requirements imposed by an embedding into 5.' 

Example 8. After a small amount of simplification, the characteristic formula ^{S) for 
the structure S shown in Fig. 2 is FList A where is: 

3v : nodef (v) A 3v : nodef ^ (v) 
A Vw : nodef^ (w) V nodef^ (w) 
A ApeP,Vu;i : Ai=i,2(nodef,(«;i) ^ 

A Vwi, W2 : (nodef^(«;i) Anodef^(w2) eq{wi,W2) A -171(^1,^2) A -171(^2,^1)) 
A (nodef^(«;i) Anodef2(7«2) ^ -^eq{wi,'W2) A ^n{w2,wi)) 

The node formulas are given in Example 6, and the predicates for the insert program 
in Fig. 1(b) are shown in Table 1. Above, we simpUfied the formula from Eq. (8) by 
combining implications that had the same premises. The integrity formula F^ist is given 
in Example 2. Note that it uses transitive closure to define the reachability predicates; 
consequently, j{S) is a formula in first-order logic with transitive closure. 

When a predicate has an indefinite value on some node tuple, a corresponding con- 
junct of Eq. (7) can be omitted, because it simplifies to 1. 

Thus, the size of this simpUfied version of is Unear in the number of definite 
values of predicates in S. Assuming that the node^ formulas contain no quantifiers or 
transitive-closure operator, e.g., when S is bounded, the ^•^ formula has no quantifier 
alternation, and does not contain any occurrences of the transitive-closure operator. 
Thus, the formula 7 is in Existential-Universal normal form (and thus decidable for 

'Definition 11 relates to all FO-identifiable structures, not only to bounded structures. For 
bounded structures, it can be simplified by omitting £,^[p\ for all unary predicates p, because 
it is implied by (,fotai- In fact, it can be omitted only for the abstraction predicates, as de- 
fined in [37]; however throughout this paper we consider all unary predicates to be abstraction 
predicates. 



14 



satisfiability) whenever F is in Existential-Universal normal form and does not contain 

transitive closure.'" Moreover, if the maximal arity of the predicate in V is 2, then 7 is 
in the two-variable fragment of first-order logic [31], wherever F is. In Section 5, we 
discuss other conditions under which 7 can be expressed in a decidable logic. 

The following theorem shows that for every FO-identifiable structure S, the formula 
7(6') accepts exactly the set of 2-valued structures represented by 5*. 

Theorem 1. For every FO-identifiable 3-valued structure S, and 2-valued structure S\ 

e 7(^) iffS^ h liS). 

4 Supervaluational Semantics for First- Order Formulas 

In this section, we consider the problem of how to extract information from a 3- valued 
structure by evaluating a query. A compositional semantics for 3- valued first-order logic 
is defined in [37]; however, that semantics is not as precise as the one defined here. The 
semantics given in this section can be seen as providing the hmit of obtainable precision. 

The Notion of Supervaluational Semantics defined below, has been used in [38, 3]. 

Definition 12. (Supervaluational Semantics of First-Order Formulas) Let X be a 

set of 3-valued structures and (fbea closed formula. The supervaluational semantics 

of <f in X, denoted by {{f)){X), is defined to be the join of the values of obtained 
from each of the 2-valued structures that X represents, i.e., the most-precise conser- 
vative value that can be reported for the value of formula ip in the 2-valued structures 
represented by X is 

(I if5^ h '/'forallS'^ e 7(X) 
y)){X) =Iq if 5^ ^ for all e 7(X) (10) 
(^1/2 otherwise 

The compositional semantics given in [37] and used in TVLA can yield 1 /2 for tp, 
even when the value of is 1 for all the 2-valued structures S*^ that S represents (or 
when the value of </? is for all the S"^). In contrast, when the supervaluational semantics 
yields 1 /2, we know that any sound extraction of information from S must return 1 /2. 

Example 9. We demonstrate now that the supervaluational semantics of the formula 
<^x^next7tNULL = ^V\,V2 ■ x{vi) An{vi,V2) oTi the structure 5 from Fig. 2(d) is 1. That 
is, we wish to argue that for all of the 2-valued structures that structure S from Fig. 2(d) 
represents, the value of the formula tpx^next^^NULL must be 1. 

We reason as follows: S represents a list with at least two nodes; i.e., all 2-valued 
structures represented by S have at least two nodes. One node, u\, corresponding to iii 
in S, is pointed to by program variable x. The other node, corresponding to the summary 
node U2, must be reachable from x. Consider the sequence of nodes reachable from x, 
starting with v\. Denote the first node in the sequence that embeds into 7/2 by ti^. By 
the definition of reachability, there must be an n-link to u\ from a node embedded into 

For practical reasons, we often replace the node formula by a new (definable) predicate, and 
add its definition to the integrity formula. 
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ui . But the integrity rules guarantee that there is exactly one node that embeds into ui , 

namely, v\. Therefore, the formula x{vi) A n{vi, V2) holds for [vi 1-^ u\,V2 u\]. 

Note that this formula will be evaluated to 1/2 by TVLA, because x{vi)An{vi,V2) 
evaluates to 1/2 under the assignment [vi 1-^ ui,V2 U2]: the compositional seman- 
tics yields a; (ui) An('Ui,'U2) = 1 A 1/2 = 1/2. 

Notice that Definition 12 does not provide a constructive way to compute {{ip)){X) 
because 7(X) is usually an infinite set. 

Computing Supervaluational Semantics using Theorem Provers. If an appropri- 
ate theorem prover is at hand, ((tp)) (S) can be computed with the procedure shown 
in Fig. 5. This procedure is not an algorithm, because the theorem prover might not 
terminate. Termination can be assured by using standard techniques (e.g., having the 
theorem prover return a safe answer if a time-out threshold is exceeded) at the cost of 
losing the abiUty to guarantee that a most-precise result is obtained. If the queries posed 
by operation Supervaluation can be expressed in a decidable logic, the algorithm 
for computing supervaluation can be implemented using a decision procedure for that 
logic. In Section 5, we discuss such decidable logics that are useful for shape analysis. 

procedure Supervaluation ((^ : Formula, 

X: Set of 3-valued structures) : Value 
if (^{X) ^ if is valid) return 1; 
else if (y{X) => ^ip is valid) return 0; 
otherwise return 1/2; 



Fig. 5. A procedure for computing the supervaluational value of a formula <p that encodes a query 
on a 3-valued structures S. 

5 Applications 

The experiments discussed in this section demonstrate how the 7 operation can be har- 
nessed in the context of program analysis: the results described below go beyond what 
previous systems were capable of. In Section 5.1, we discuss the use existing theo- 
rem provers and their limitations. In Section 5.2, we suggest a way to overcome these 
Umitations, using decidable logic. 

We present two examples that use 7 to read out information from 3-valued structures 
in a conservative, but rather precise way. The first example demonstrates how supervalu- 
ational semantics allows us to obtain more precise information from a 3-valued structure 
than we would have using compositional semantics. The second example demonstrates 
how to use the 3-valued structures obtained from a TVLA analysis to construct a loop 
invariant; this is then used to show that certain properties of a linked data structure hold 
on each loop iteration. In addition, we briefly describe how 7 can be used in algorithms 
for computing most-precise abstraction operations for shape analysis. Finally, we re- 
port on other work that employs 7 to generate a concrete counter-example for shape 
analysis. 

Remark. The 7 operation defines a symbolic concretization with respect to a given 
abstract domain. In Section 3, we defined 7 for the abstract domain of sets of 3-valued 
structures. In Appendix A, we describe a related abstract domain and define 7 for it. 
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The applications described in this section can be used with any domain for which 7 is 
defined in some logic and a theorem prover for that logic exists. In our examples, we 
use 7 defined in Section 3 and the first-order logic with transitive closure. 

5.1 Using the First-Order Theorem Prover SPASS 

The TVLA (L27J) system performs an iterative fixed-point computation, which yields 
at every program point p a set Xp of bounded structures. It guarantees that "t{Xp) 
is a superset of the 2-valued structures that can arise at p in any execution. We have 
implemented the 7 operation in TVLA, and employed SPASS [39] to check, using the 
formula '^{Xp), that certain properties of the heap hold at program point p. Also, we 
implemented the supervaluational procedure described in Section 4, employing SPASS. 
The enhanced version of TVLA generates the formula 7(5*) and makes at most two 
calls to SPASS to compute the supervaluational value of a query in structure S. In 
this section, we report on our experience in using SPASS and the problems we have 
encountered. 

First, calls to SPASS theorem prover need not terminate, because first-order logic 
is undecidable in general. However, in the examples described below, SPASS always 
terminated. 

Example 10. In Example 9 we (manually) proved that the supervaluational value of 
the formula v^x^next^^NULL on the structure S from Fig. 2(d) is 1. To check this auto- 
matically, we used SPASS to determine the vaUdity of 7(6') lyJx^next^^NULL; SPASS 
indicated that the formula is vaUd. This guarantees that the formula (px^next^^NULL eval- 
uates to 1 on all of the 2-valued structures that embed into S. 

In contrast, TVLA uses Kleene semantics for 3- valued formulas, and will evaluate 
'/'x^nextyiMULL to 1/2: Under the assigrmient [vi i— > Ui,V2 i— > "2], A n{vi,V2) 

evaluates to 1 A 1/2, which equals 1/2. 

Generating and Querying a Loop Invariant We used TVLA to compute, for each 
program point p, a set Xp of bounded structures that overapproximate the set of stores 
that may occur at that point. We then generated 7(-'^p). Because TVLA is sound, 7(^p) 
must be an invariant that holds at program point p, according to Theorem 1 . In particu- 
lar, when p is a program point that begins a loop, ^{Xp) is a loop invariant. 

Example 11. Let X = {Si \ i = 1,...,5} denote the set of five 3- valued structures 
that TVLA found at the begiiming of the loop in the insert program from Fig. 2. 
Table 2 and Table 3 of Appendix C show the Si and their characteristic formulas. The 
loop invariant is defined by 

5 

7(X) = Fust A ( V C^O 

i=l 

Using SPASS, this formula was then used to check that in every structure that can 
occur at the beginning of the loop, x points to a valid list, i.e., one that is acyclic and 
unshared. This property is defined by the following formulas: 

acyc^ "Vwi,W2 : rx{vi) An+(wi,W2) -in+(w2,wi) 

unSa; =\/v: rx{v) ^ -^(3wi,W2: ^eq{wi,W2) An(wi,v) An{w2,v)) 

hstx = acyc^ A uns^ 
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We applied SPASS to check the validity of ^{S) ^ lista;; SPASS indicated that the 
formula is vahd.'^ 

In addition to the termination issue, a second obstacle is that SPASS considers in- 
finite structures, which are not allowed in our setting.'^ As a consequence, SPASS can 
fail to verify that a formula is valid for our intended set of structures; however, the op- 
posite can never happen: whenever SPASS indicates that a formula is valid, it is indeed 
vahd for our intended set of structures. 

Example 12. We tried to verify that every concrete linked-list represented by the 3- 
valued structure S from Fig. 2(d) has a last element. This condition is expressed by 
the formula (piast = ^Vi\lv2 ■ -^n{vi,V2). The supervaluational value of (fiast on a 
structure S is {(</?)) (5) = 1, for the following reasons. Because has the definite 
value 1 on U2 in S, all concrete nodes represented by the summary node U2 must be 
reachable from x. Thus, these nodes must form a linked list, i.e., each of these concrete 
nodes, except for one node that is the "last", has an n-edge to another concrete node 
represented by U2- The last node does not have an n-edge back to any of the nodes 
represented by U2, because that would create sharing, whereas the value of predicate 
is in 5 is on it2. Also, the last node cannot have an n-edge to the concrete node 
represented by ui, because the value of predicate n on the pair {u2,ui) in S is 0. 
Therefore, the last element cannot have an outgoing n-edge. 

We used SPASS to determine the vahdity of 7(5) ipiasL', SPASS indicated that 
the formula is not valid, because it considered a structure that has infinitely many con- 
crete nodes, all represented by U2- Each of these concrete nodes has an n-edge to the 
next node. 

The validity test of the formula 7(5*) ^fiast failed, of course, because there 
exists a finite structure that is represented by S (and thus satisfies 7(5)) and has a last 
element. For example, the structure in Fig. 2(a) that represents a list of size 2. Therefore, 
the procedure Supervaluation{ipiast, S) implemented using SPASS returns 1/2, even 
though the supervaluational value of (fiast on S* is 1. 

The third, and most severe, problem that we face is that SPASS does not support 
transitive closure. Because transitive closure is not expressible in first-order logic, we 
could only partially model transitive closure in SPASS, as described below. 

SPASS follows other theorem provers in allowing axioms to express requirements 
on the set of structures considered. We used SPASS axioms to model integrity rules. 
To partially model transitive closure, we replaced uses of n~^{vi,V2) by uses of a new 
designated predicate t[n]{vi,V2). Therefore, SPASS will consider some structures that 
do not represent possible stores. As a consequence, SPASS can fail to verify that a for- 
mula is valid for our intended set of structures; however, the opposite can never happen: 
whenever SPASS indicates that a formula is valid, it is indeed valid for our intended 
set of structures. To avoid some of the spurious failures to prove validity, we added 
axioms to guarantee that (i) t[n]{vi,V2) is transitive and (ii) t[n]{vi,V2) includes all 

" SPASS input is available from www . cs . tau . ac . il/~gretay. 
Our intended structures are finite, because they represent memory configurations, which are 
guaranteed to be finite, although their size is not bounded. 
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of n{vi,V2)', thus, t[n]{vi,V2) includes all of n'^{vi,V2)- Because transitive closure re- 
quires a minimal set, which is not expressible in first-order logic, this approach provides 
a looser set of integrity rules than we would hke. However, it is still the case that when- 
ever SPASS indicates that a formula is valid, it is indeed valid for the set of structures 
in which t[n]{vi,V2) is exactiy n+(wi, ^2). 

Example 13. SPASS takes into account the structure shown in Fig. 6, in which the value 
of t[n]{ui,us) is 1, but the value of n+(ui, U3) is because there is no n-edgefromu2 
to W3. 





t[n] 












— © 




t 




t 
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Fig. 6. SPASS takes into account structures in which the t[n] predicate overapproximates the n"*" 
predicate, such as the structure shown in this figure. 

5.2 Decidable Logic 

The obstacles mentioned in Section 5.1 are not specific to SPASS. They occur in all 
theorem provers for first-order logic that we are aware of. To address these obstacles, 
we are investigating the use of a decidable logic. To reason about hnked data structures, 
we need a notion of reachabiUty to be expressible, for example, using transitive closure. 
However, a logic that is both decidable and includes reachability must be linuted in 
other aspects. 

One such example is the decidable second-order theory of two successors WS2S 
[33]; its decision procedure is implemented in a tool called MONA [17]. Second-order 
quantification suffices to express reachability, but there are still two problems. First, 
the decision procedure for WS2S is necessarily non-elementary [29]. Second, WS2S 
only applies to trees, or, equivalentiy, to function graphs (graphs with at most one edge 
leaving any vertex). 

Another example is EA{TC, /^), which is a subset of first-order logic with transi- 
tive closure, in which the following restriction are imposed on formulas: (i) they must 
be in existential-universal form, and (ii) they must use at most a single unary function 
/, but can use an arbitrary number of unary predicates. [19] shows that the decision 
procedure for satisfiability of EA{TC, /^) is NEXPTIME-complete. 

In spite of their hmitations, both WS2S and EA{TC, /^) can be useful for reason- 
ing about shape invariants and mutation operations on data structures, such as singly 
and doubly linked lists, (shared) trees, and graph types [22]. The key is the simulation 
technique [20], which encodes complex data-structures using tractable structures, e.g., 
function graphs or simple trees, where we can reason with decidable logics. 

For example, given a suitable simulation, 7 formula can be expressed in WS2S 
and EA{TC, f^) if the integrity formula F can. This follows from the definition of 
7 in Eq. (9) and the fact that does not contain quantifier alternation. This makes 
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EA{TC, f^) and WS2S candidate implementations for the decision procedure used in 
the supervaluational semantics and in the algorithms described below. 

5.3 Assume-Guarantee Shape Analysis 

The 7 operation is useful beyond computing supervaluational semantics: it is a neces- 
sary operation used in the algorithms described in [42,34]. These algorithms perform 
abstract operations symbolically by representing abstract values as logical formulas, 
and use a theorem prover to check validity of these formulas. These algorithms im- 
prove on existing shape-analysis techniques by: 

- conducting abstract interpretation in the most-precise fashion, improving the tech- 
nique used in the TVLA system [27, 37], which provides no guarantees about the 
precision of its basic mechanisms. 

- performing modular verification using assume-guarantee reasoning and procedure 
specifications. This is perhaps the most-exciting potential application of 7 (and 
EA{TC, f^) logic), because existing mechanisms for shape analysis, including 
TVLA, do not support assume-guarantee reasoning. 

5.4 Counter-example Generation 

Some prehminary work to use the techniques presented in this paper to improve the 
applicability of TVLA has been carried out. The tool described in [10,9] uses the 7 
operation to generate a concrete counter-example for a potential error message produced 
by TVLA for an intermediate 3-valued structure 5 at a program point p. Such a tool is 
useful to check if a reported error is a real error or a false-alarm, i.e., it never occurs on 
any concrete store. 

Generation of concrete counter-examples from S proceeds as follows. First, S is 
converted to the formula 7(5). Then, the tool uses weakest precondition to generate 
a formula that represents the stores at the entry point that lead to an execution trace 
that reaches p with a store that satisfies 7(5*). Finally, a separate tool [28] generates a 
concrete store that satisfies the formula for the entry point. 

6 Related Work 

There is a sizeable hterature on structure-description formalisms for describing proper- 
ties of linked data structures (see [1, 37] for references). The motivation for the present 
paper was to understand the expressive power of the shape abstractions defined in [37]. 

In previous work, Benedikt et al. [1] showed how to translate two kinds of shape 
descriptors, "path matrices" [14, 16] and the variant of shape graphs discussed in [35], 
into a logic called Lr ("logic of reachability expressions"). The shape graphs from [35] 
are also amenable to the techniques presented in the present paper: the characteristic 
formula defined in Eq. (8) is much simpler than the translation to Lr given in [1]; 
moreover, Eq. (8) applies to a more general class of shape descriptors. However, the 
logic used in [1] is decidable, which guarantees that terminating procedures can be 
given for problems that can be addressed using Lr. 

The Pointer Analysis Logic Engine (PALE) [30] provides a structure-description 
formalism that serves as an assertion language; assertions are translated to second-order 
monadic logic and fed to MONA. PALE does not handle all data structures, but can 
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handle all data structures describable as graph types [22]. Because the logic used by 
MONA is decidable, PALE is guaranteed to terminate. 

One point of contrast between the shape abstractions based on 3-valued structures 
studied in this paper and both Lr and the PALE assertion language is that the powerset 
of 3-valued structures forms an abstract domain. This means that 3-valued structures 
can be used for program analysis by setting up an appropriate set of equations and 
finding its fixed point [37]. In contrast, when PALE is used for program analysis, an 
invariant must be supplied for each loop. 

Other structure-description formalisms in the literature include ADDS [15] and 
shape types [12]. 

The supervaluational semantics for first-order logic discussed in Section 4 is related 
to a number of other supervaluational semantics for partial logics and 3-valued log- 
ics discussed in the Uterature [38, 2, 3]. Compared to previous work, an innovation of 
Fig. 5 is the use of 7 to translate a 3-valued structure to a formula. In fact, Fig. 5 is an 
example of a general reductionist strategy for providing a supervaluational evaluation 
procedure for abstract domains by using existing logics and theorem-provers/decision- 
procedures. 

A recent work [23], which is an abbreviated version of a more extensive presentation 
of the results reported in [24], provides an alternative characterization of 3-valued struc- 
tures using logical formulas, equivalent to the characterization presented in the present 
paper. The present paper, which extends and elaborates on the results of [41], unlike 
[23,24], reports on experience and algorithmic issues in using logical characterization 
of structures for shape analysis; this material is important because shape analysis is the 
primary motivation and the intended application of this paper, as well as [23, 24]. Also, 
Section A.4 of the present paper gives a simple semantic argument for the property of 
closure under negation, shown in [24] using a different formalism. The technical sim- 
ilarities and differences between the two works are described in a note available from 
www . cs . tau . ac . il/^gretay. 

7 Final Remarks 

In [34], we discuss how to perform all operations required for abstract interpretation 
in the most-precise way possible (relative to the abstraction in use), if certain primitive 
operations can be carried out, and if a sufficiently powerful theorem prover is at hand. 
Chief among the primitive operations that must be available is 7; thus, the material that 
has been presented in this paper shows how to fulfill the requirements of [34] for a 
family of abstractions based on 3-valued structures (essentially those used in our past 
work [37] and in the TVLA system [27]). 

In ongoing work, we are investigating the feasibility of actually applying the tech- 
niques from [34] to perform abstract interpretation for abstractions based on 3-valued 
structures. This approach could be more precise than TVLA because, for instance, it 
would take into account in a first-class way the integrity formula of the abstraction. In 
contrast, in TVLA some operations temporarily ignore the integrity formula, and rely 
on later clean-up steps to rectify matters. 

Another step can be taken in this direction, which is to ehminate the use of 3-valued 
structures, and directiy carry out fixed-point computations over logical formulas. 
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We are also investigating the feasibility of using the results from this paper to de- 
velop a more precise and modular version of TVLA by using assume- guarantee rea- 
soning [42]. The idea is to allow arbitrary first-order formulas with transitive closure to 
be used to express pre- and post-conditions, and to analyze the code for each procedure 
separately. 

Acknowledgements We thank Neil Immerman, Viktor Kuncak, Tal Lev-Ami, and 
Alexander Rabinovich for their contributions to this paper. 
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A Characterizing Canonical Abstraction by First-Order Formulas 



This section defines an alternative absti'act domain for use in shape analysis (and other 
logic-based analyses). This domain keeps more explicit information than the one in 
Section 2.4 and enjoys nice closure properties (see Section A. 4). This domain uses a 
particular class of embedding functions that are defined by a simple operation, called 
canonical abstraction, which maps 2- valued structures into a linaited subset of bounded 
structures. 

A.l Canonical Abstraction 

Canonical abstraction was defined in [36] as an abstraction with the following proper- 
ties: 

- It provides a uniform way to obtain 3-valued structures of a priori bounded size. 
This is important to automatically derive properties of programs with loops by em- 
ploying iterative fixed-point algorithms. Canonical abstraction maps concrete nodes 
into abstract nodes according to the definite values of the unary predicates. 

- The information loss is minimized when multiple nodes of S are mapped to the 
same node in S', 

This is formalized by the following definition: 

Definition 13. A structure S' = {U^' , l^') is a canonical abstraction of a structure S, 
if S ccanoTOcai gi ^ -where canonical : ^ is the following surjective mapping: 

canonical{u) = '«{pePi|tS(p)(«)=i},{pe-Pi|i.s(p)(M)=o} (H) 
and, for every p &Vk of arity k, 

i^'ip){u[,...,ui)= □ i^{p){uu...,Uk) (12) 

canonical{ui) — u'^ G , 
1 < i < fe 

We say that S' = canonical{S). 

The name "u{peVi\L^ {p){u)=i}.{pe'Pi\L^ {p){u)=a}" is known as the canonical name 
of node u. The subscript on the canonical name of u involves two sets of unary predicate 
symbols: (i) those that are true at u, and (ii) those that are false at u. 

Example 14. In structure S from Fig. 2, the canonical names of the nodes are as fol- 
lows: 



Node 


Canonical Name 


Ml 

'"2 


'"{.r.r,.},{,,.t.c:.,,.',r„.r,,,r,} 



In the context of canonical abstraction, 5 shown in Fig. 2 represents Sb and Sc, but not 
Sa', i.e., S represents lists that are pointed to by x that have at least three nodes, but it 
does not represent a list with just two nodes. The reason is that predicates n and eg have 
indefinite values in S, but a list with only two nodes cannot have both and 1 values 
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for the corresponding entries, as required for minimizing information loss as defined in 
Eq. (12). In contrast, according to the abstraction that relies on embedding, defined in 
Section 2.4, S represents hsts with two or more elements. 

To characterize canonical abstraction, we define the set of 3-valued structures that 
are "images of canonical abstraction" (ICA), i.e., the results of applying canonical ab- 
straction to 2-valued structures. 

Definition 14. (Image of canonical abstraction (ICA)) Structure S is an ICA if there 
exists a 2-valued structure such that S is the canonical abstraction ofS^. 



Concretization of 3- Valued Structures. Canonical abstraction allows us to define the 
(potentially infinite) set of 2-valued structures represented by a set of 3-valued struc- 
tures, that are ICA 

Definition 15. (Concretization of ICA Structures) For a, ?efo/.vfrMcfMreiX C 3-STRUCT[P], 
that are ICA structures, we denote by jdX) the set of 2-valued structures that X rep- 
resents, i.e., 

, . _ ( € 2-STRUCT[V] I exists S G Xsuch that 
'^'^^ ' \S is the canonical abstraction ofS^ and 5'' |= F 

Also, for a singleton set X = {S} we write jc{S) instead of^^iX)- 

The abstract domain is the powerset of ICA structures, where the order relation is 
set inclusion. Note that this abstract domain is finite, because there is a finite number 
of different ICA structures (up to isomorphism). Denote by ac the extension of the ab- 
straction function canonical to sets. This defines a Galois connection [a^ 7c) between 
sets of 2-valued structures and sets of ICA structures. 

A.2 Canonical-FO-Identifiable Structures 

We define the notion of canonical-FO-identifiable nodes using canonical abstraction 
rather than embedding, which was used for the notion of FO-identifiable nodes in Def- 
inition 8. 

Definition 16. We say that a node u in a 3-valued structure S is canonical-FO-identifiable 

if there exists a formula node^{w) with designated free variable w, such that for every 
2-valued structure S\ ifS is the canonical abstraction ofS^, i.e., G IciS), then for 
every concrete node e U^^ : 

canonical {u^) = u <^=> ,[w u^]\= node^{w) (14) 

S is called canonical-FO-identifiahle if all the nodes in S are canonical-FO-identifiable. 

We can also prove Lemma 1 for the case of canonical abstraction rather than em- 
bedding. 

" Eq. (12) is called the tight-embedding condition in [37]. 
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A.3 Characterizing Canonical Abstraction 

An ICA structure is always a bounded structure, in which all nuUary and unary predi- 
cates have definite values. This is formaUzed by the following lemma: 

Lemma 4. If 3-valued structure S = {U^, l^) over vocabulary V is ICA then: 

(1) S is a bounded structure. 

(ii) For each nullary predicate p, L^{p)() G {0,1}- 

(iii) For each element u £U and each unary predicate p, i^{p){u) G {0, 1}. 

The following lermna shows that ICA structures are canonical-FO-identifiable: 

Lemma 5. Every 3-valued structure S that is an ICA is canonical-FO-identifiable, 
where 

node^^{w)= /\ (15) 
peVi 

Using this fact, we can define a formula that accepts exactly the set of 2-valued 
structures represented by S under canonical abstraction. The formula is merely 
with additional conjuncts to ensure that the information loss is minimized, i.e., for every 
predicate p and every 1/2 entry of p, the 2-valued structure has both a corresponding 1 
entry and a corresponding entry. 

Definition 17. (First-Order Cliaracteristic Formula for Canonical Abstraction) Let 

3-valued structure S = {U^ , l) be an ICA. 

For a predicate p of arity r, we define the closed formula for p: 

, Wr : Aj=i node^, (wj) A p{wi, ...,Wr) 
, Wr : Aj=i node^,^ {wj) A .. .,Wr 

(16) 

maxR 

r^^e^A /\ /\ r'\p] (17) 

The characteristic formula for canonical abstraction of a set of ICA structures 

X C 3-STRUCT[P] is defined by 

%{X) =FA{\/ T^) (18) 
Sex 

Also, for a singleton set X = {S}, where S is an ICA structure, we write jciS) instead 
of%{X). 

If not all unary predicates are defined as abstraction predicates, then the result may be a 
bounded structure of the less restrictive kind mentioned in Section 3.1. Also, unary predicates 
that are not abstraction predicates may have indefinite values. 



A 3wi ,.. . 



Sr 1 *f A ' 3wi,... 

{ti'i, . . . ,<} C !7-5 
s.f.t^(j))K,...,0 = 1/2 

The formula ofS is defined by: 
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Example 15. The characteristic formula for canonical abstraction of the structure S 
shown in Fig. 2(d) is: 



%{S) = 7{S) 



A 3wi , W2 
A 3wi , W2 
A 3wi , 
A 3wi,W2 
A 3wi , W2 
A 3w\ , W2 



nodef^ (wi) A nodef^ (1112) A n{wi,W2) 
nodef^(u;i) Anodef2(w2) A -^n{wi,W2) 
node^^{wi) Anode^^{w2) An{wi,W2) 
nodef^(ii;i) Anodef^(w2) A -^n{wi,W2) 
nodef^ {wi) A nodef^ {11)2) A eg(wi , W2) 
nodef^(u;i) A nodef ^^2) A ^eq{wi,W2) 



(19) 



where 7(5*) is given in Example 8. As explained in Example 14, S does not represent a 
hst of two nodes; the corresponding 2-valued structure Sa, shown in Fig. 2(a), does not 
satisfy Eq. (19), because the last four hnes cannot be satisfied by any assigrmient in Sa- 

Remark. The formula does not contain quantifier alternation and transitive closure. 
Therefore, % is in Existential-Universal normal form (and thus decidable) whenever F 
is in Existential-Universal form and does not contain transitive closure. 

Theorem 2. For every 3-valued structure S that is an ICA and 2-valued structure S"^ 



A.4 Closure Properties of ICA Structures 

This section gives a simple semantic proof that the class of formulas that characterize 
ICA structures is closed under negation. This result was shown in [24] using a different 
formahsm. 

From Eq. (12) it follows that for two distinct ICA structures 5*1 and 5*2, ^lc{Si) n 
7c(<S'2) = 0- Intuitively, each 2-valued structure can be represented by exactly one ICA 
structure. This imphes that the complement of the concretization of an ICA structure 
can be represented precisely by a finite set of ICA structures. 

Denote by T> the set of all 2-valued structures that satisfy the integrity formula F: 
D = {^if e 2-STRUCT[P] I ^ 

Lemma 6. Let S be an ICA structure. There exists a set of ICA structures X such that 



This can be reformulated using Theorem 2 in terms of characteristic formulas for ICA 
structures. This shows that the class of formulas that characterize canonical abstraction 
is closed under negation, in the following sense: 

Lemma 7. Consider the formula from Eq. (17), for some ICA structure S. There 
exists a set of ICA structures X, such that the formula F A ->t^ is equivalent to the 
formula ■^c{X). 

Remark. Note that Lemma 6 and Lemma 7 do not hold for bounded structures, de- 
scribed in Section 3.1. The reason, intuitively, is that some 2-valued structures can be 
represented by more than one bounded structure. 



e 7c(S) iffS^ H %{S) 



7c(X)=l?x7c(5). 
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For example, consider the 2- valued structure Sa from Fig. 2, which denotes a linked- 
list of length exactly 2. It is in the concretization of two different 3-valued structures: the 
first is Sa itself, considered as a 3-valued structure S' (that represents a single 2-valued 
structure: 7(6") = {^o}); the second is the structure S from Fig. 2. 

For the purpose of this example, assume that the integrity formula F (that defines "D) 
requires that all elements be reachable from x, in addition to the integrity formula F^ist 
from Example 2. The complement C = V \ 7(5") = V \ Sa is the set that contains 
an empty linked list, a linked list of length 1, and linked lists of length 3 or more. The 
representation of C is a set X of bounded structures. To capture Unked lists of length 3 
or more, X must contain a 3-valued structure S from Fig. 2. However, 7(5') includes a 
list of length 2 as well, denoted by Sa, which is not in C. Therefore, 7(5) 7^ C. 

B Characterizing General 3- Valued Structures by NP Formulas 

In this section, we show how to characterize general 3-valued structures. 

B.l Motivating Example 

If the input structure is FO-identifiable, Theorem 1 ensures that the result of operation 7 
precisely captures the concretization of the input structure. The purpose of this example 
is to show what happens if we apply the 7 operation, as defined in Section 3, to a 
structure that is not FO-Identifiable. When 5* is not FO-identifiable, 7(5) only provides 
a sufficient test for the embedding of 2-valued structures into S. 

Example 16. The 3-valued structure 5 shown in Fig. 3 describes undirected graphs. 
We draw undirected edges as two-way directed edges. This structure uses a set of pred- 
icates V = {eq, f, /;}, where /(('i, 1^2) and b{v2,vi) denote the forward and backward 
directions of an edge between nodes vi and V2. 

When Eq. (8) is applied to the 3-valued structure 5 shown in Fig. 3, we get 

A-=i 3^;:nodef,(«) 

A Vw;: Vtinodef^M 

A Wwi,W2 : Afe^,(nodef^(u;i) Anode|(w;2) ^ f/^iwi,W2)) 
A Vw;i,w72 : Afe/j(nodef^,(wi) Anodef^.(w;2) ^ &^/^(u'i,W2)) 
A Vwi,W2 : ALi(nodef^(wi) Anodef^(u;2) ^ 6"(wi,W2)) 
A Vwi , W2 : Ai=i (nodef . {wi ) A nodef . {w2 ) ^ f°{wi,W2)) 

Because this example does not include unary predicates, the node formula given in 
Lemma 3 evaluates to 1 on all elements. Hence, Eq. (20) can be simpUfied to: 



Al 


,3v: 


1 






A 


Vw : 


vt 






A 


Vwi 


W2 


:Afe^,(lAl = 


^1) 


A 


Vwi 


W2 


:A.^,(1A1 = 


^1) 


A 


Vwi 


W2 


: ALi(1A1 = 
: Ali(lAl = 


> -i6(wi,?J72)) 


A 




W2 


> -•/(wi, W2)) 



After further simphfication, we get the formula \I'W\,W2 ■ ^f{wi,W2) A Vwi,ui2 : 
-■6(^1 , ^2)- The simplification is due to the fact that the implication in Eq. (7) uncon- 
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ditionally holds for all pairs of distinct nodes, because / and b evaluate to 1/2 on those 
pairs, except for the requirement imposed by the absence of self-loops in S. 

This formula is only fulfilled by graphs with no edges, which are obviously 3- 
colorable. But this formula is too restrictive: it does not capture some 3-colorable 
graphs. 

B.2 Characterizing General 3- Valued Structures 

Existential monadic second-order formulas are a subset of Fagin's second-order formu- 
las [11], named NP formulas, which capture NP computations. A formula in existential 
monadic second-order logic has the form: 

3Vi,V2,...,V„:ip 

where the Vi are set variables, and </5 is a first-order formula that can use membership 
tests in Vi. We show that in this subset of second-order logic, the characteristic formula 

from Definition 1 1 can be generalized to handle arbitrary 3-valued structures using exis- 
tential quantification over set variables (with one set variable for each abstract node).'^ 

Definition 18. (NP Cliaracteristic Formula) Let S = {U = {ui, U2, ■ ■ ■ , t) be a 
3-valued structure. 

We define the following formula to ensure that the sets are non-empty: 

^non.empty H = : noJgf . {Wi) (21) 

We define the following formula to ensure that the sets Vk, Vj are disjoint: 

Clisjoint[k,j] ^ Vwi, W2 : noc/ef Jwi) A nodel^{w2) ^ -^eq{wi,W2) (22) 
The NP characteristic formula of S is defined by: 



— > ■ ■ ■ y^n '■ Ai=l ^non.emptyi'^] ^ Akjtj ^disjointi^i j] 

^ ^total 
A 

^nullary 
maxF 



(23) 



where ^^j^;, ^nuilary \p\ defined as in Definition 11, except that node^. is the NP 
formula node^. (w) = {w € Vi). (Here, we abuse notation slightly by referring to Vi in 
node^. (w). This could have been formalized by passing Vi, . . . ,Vnas extra parameters 
to node^..) 

The NP characteristic formula of a finite set X C 3-STRUCT[P] is defined by: 

jr,piX)^FA{\/ (24) 
Sex 

Finally, for a singleton set X = {S} we write 7ivp(5') instead of'yNp{X). 



" This result is mostly theoretical. In principle, this encoding falls into monadic-second order 
logic, which is decidable if we restrict the concrete structures of interest to trees. However, we 
have not investigated this direction further. 
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Example 17. After a small amount of simplification, the NP characteristic formula ^"^ 
for the graph shown in Fig. 3 is: 

/\ /\k^j Vwi, W2 : {wi €Vk AW2 GVj ^ -'eq{wi,W2)) (ii) 
A Vw : V^=i w G Vi (in) 
A Vwi,-u;2 : ALi(Aj=i,2 ''^i ^ ^ -■g{wi,W2) A^e(w;2,wi)) (iv) 

In this formula, Vi, V2, and V3 represent the three color classes. Line by line, the for- 
mula says: (i) each color class has at least one member; (ii) the color classes are pairwise 
disjoint; (iii) every node is in a color class; (iv) nodes in the same color class are not 
connected by an undirected edge. 

The following theorem generalizes the result in Theorem 1 for an arbitrary 3-valued 
structure S, using NP-formula jnp{S) to accept exactly the set of 2-valued structures 
represented by S. 

Theorem 3. For every 3-valued structure S, and 2-valued structure S^: 
C Generating and Querying a Loop Invariant 

Table 2 and Table 3 show the structures and the characteristic formulas for the experi- 
ment described in Example 1 1 . 

It is interesting to note that the size of S^^^ is bigger than the size of ^"^^ . This is 
natural because ^2 has more definite values, which impose more restrictions than are 
imposed by S\. 

D Proofs 

Lemma Dl Consider the 3-valued structure S shown in Fig. 3. For all 2-valued struc- 
tures C, C can be embedded into S if and only if C can be colored using 3 colors. 
Proof of the if direction: Suppose that C is 3-colorable, let c be a mapping from the 
nodes of C to the colors {1, 2, 3}. We define embedding function / from C to 5 as 
follows: f{u) = Uc(u)> i-S-> a node u G C that has color i is mapped to Ui G 5*. It is 
easy to see that / preserves predicate values in S, because the only definite values in S 
indicate the absence of self-loops. It is preserved, because there are no edges in C with 
both endpoints in the same color. 

Proof of the only-if direction: Suppose that C is embedded into S using /. We show that 
C is 3-colorable. For each node u & C, let the color of u, c(u), be the name of the cor- 
responding node in S, i.e., c{u) = f{u). The absence of self loops on any of the three 
summary nodes guarantees that a pair of adjacent nodes in C cannot be mapped by / to 
the same summary node. That is, for any edge in C the endpoints must be mapped by 
/ to different sunnmary nodes, thus they have different colors. 

Lemma 1 Let S be an FO-identifiable structure and let ui, U2 G S be distinct individ- 
uals. Let S*^ be a 2-valued structure that embeds into S and let uP' & S'^. At most one of 
the following can hold, but not both: 
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structure 



CharacteristicFormula 



node2j(w) = x{w) A y{w) A -•t{w) A -•e{w) 

A rx{w) A ry(w) A -^rt(w) A ^re(to) A ^is(«;) 
nodef^(to) = A -^y{w) A -it(io) A -ie(«i) 

A rx(«)) A Tyiw) A -^rt{w) A ^re(ui ) A -'is(ui) 

= A.=i,2(3t':nodeS.Ht.)) 

A V«) : \/,^i 2nodefi(u.) 

A Vwi,ii;2 : Ai=i.2 "O'l^S ('"O ^ 

^6(7(101,102) A -in(ui2, loi) 
A V101, 102 ; Ai=i,2 "O'l^S ('"O ^ 
Ae(7(ioi,i02) A -iri(uii,ui2) 
nodeiJJ (10) = x{w) A -Ty{w) A -it(io) A -ie(io) 

A rx{w) A -^Tyiw) A -^rt(w) A ^re(io) A -■is(ui) 
nodef2(«;) = A ^y{w) A A -'e(i«) 

A rx(if) A ry{w) A -'rt(io) A -•re{w) A -'is(iy) 

A Mw : V.^i_2nodefj^(ui) 

A Vwi,W2 : Ai=i,2 "°'1'="H"'') =*■ 

-•eq{wi,W2) A -'n(i02, loi) A n(iyl, u)2) 
A Vm)i,m)2 : Ai=i,2"0'l6f;(wi) =J> 

Ae(j'(ioi,i02) A -'n(uii,i02) 
A Vioi,i02 : Ai=i,2 n°defi(iOi) ^ 
Aeg(iDi,iD2) A -'n(ioi,i02) 



X, J ^{uij ■>-|^ 

f t 



S2 '^x y^'^x^l^y 



nodejj (10) = x(io) A ^y(w) A -nt(i«) A ^e(io) 

A ra;(io) A -^ry(w) A ^r4(io) A -^re(w) A -^is{w) 

nodef^ (10) = -■x(io) A y{w) A -^t{w) A ^e(io) 

A rx{w) A ry(w) A -^rt{w) A ^re(ii') A -iis(io) 

nodef^(«)) = -'a;(i«) A ->y(w) A -■t(iu) A -'e(i«) 

A rx{w) A rj,(M)) A -irt(i/;) A -ire(t«) A -iis(i/)) 









X ^►(^uT) 


-^@... 




t 


f 


t 


5*3 


y,rx,ry 





V"-: V>=i,2,3nodeSfH 
Vm)i,u;2 : (A,=i,2 nodef-J(uii) ^ 

eq('U)i, 1^2) A -'n(M)i, «)2)) 

A (Ai=i,2no<leS(«'i) ^ 

eg(ioi, UI2) A -111(101, 102)) 
A (node^3(u,^) Anodef3(i02) ^ 

-^eq{wi, W2) A -^n{w2, 101) A n(i/)i, i/)2)) 
A (nodefij(ioi) A nodef:^ (102) 

^69(101,102) A ^n(i02, 101)) 
A (node§(«;i) A nodef3(ii;2) => 

-ieq(M)i,iU2) A -•n{w2,wi) A -in(i/)i, i/)2)) 



Table 2. (Continued in Table 3.) The left column shows the structures that arise at the beginning 
of the loop in the insert program from Fig. 1(b). The right column shows the characteristic 
formula for each structure. Note that we omit the redundant sub-formulas ^'^ [p] , for p € Pi , that 
are part of ^iotai nodef j (w) definitions. 
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structure 



Character is ticFormula 



A rx{w) A -'ry(w) A -'rt{w) A -ire{it?) A -'is{w] 
nodef*(i(;) = A ->y{'w) A A ^e(i(;) 

A rx{w) A -'ry(i(j) A —'rt{w) A -ire{it?) A -'is(ii;] 
nodef^(i(;) = A A ->t{w) A -'e(i(j) 

A Vxiw) A 'ry(i(j) A ^rt(i[;} A ^re{w) A -'?s{ii;) 

nodef^(u;) = ->x{w) A ->y{'w) A A ^e(i(j) 

A ) A ry{w) A ^rf('u;) A ^re{w) A -'?s{i(;) 



A 

A Wwi,W2 



t 



Afci,...,4(3'' : nodeS'W) 
Vto : Vi=i,...,4nodef*(jij) 

(Ai=i,2node,S(i«i) 
eq{wi,'W2) A ^n(w;i,w!2)) 
(Ai=i,2n°'ief3(wi) =S> 

(nodef;;(TOi) A nodef J(to2) =^ 
-■6^(101, 102) A -■n()ii2, toi)) 
(nodefj(TOi) Anodefj(TO2) ^ 
-ieg(ifJi, t/J2) A -in(tt;2, wi)) 
(nodefj(M;i) Anodefj(«J2) ^ 
-ieg(uJi, UJ2) A -in{w2,wi) A -in(ti;i. 
(nodef^Cwi) Anodefj(wi2) ^ 
-ie(/(u'i.u'2) A -nn(w;2. u'l)) 
(nodef J ('iL'i ) A nodeJ'J(iL'2) => 
-ieg(u'i , u'2) A -in(uj2, wi) A -^n{wi. 
(nodef^()Di) A nodef^(t02) => 
-^eq{wi, W2) A -in{uj2, tfi) A -^n{wi. 



W2)) 

W2)) 
W2)) 



nodeSj(w) = x(to) A ^y(w) A -it(uj) A -ie(uj) 

A ra^(^t)) A -^Vyiw) A ^rt{w) A ^re{?w) A ^is{w) 

node§(w) = -'x(w) A -^y(w) A -■t(i«) A ^e(uj) 

A A -iry(u') A ^rt(?w) A -^re{iw) A ^is(w) 

nodefl(iu) = -irE(uj) A y(w) A -■*(«)) A -ie(i«) 

A r^(i[;) A ry{w) A ^rt(i[;) A ^re(zf) A -'is{w) 




A 

A Vw;i,it'2 



Ai=i,2,3(3*' : node2f(«)) 
Vu; : Vi=i 2 3 nodef;'(™) 

(A,= i 2"°<''^S(™') ^ 
eg(?(;i,w;2) A ^n{w;i , u'2)) 
(A,= i.2n°defS(«.'0 ^ 
eg(i(;i, i(;2) A -^n{wi,W2)) 
{node^l^-^ {u'l ) A node^f^' {u'2) => 
^eg(u'i,u'2) A ^n{w;2,u'i)) 
{nodef^'{u'2) A nodef :^ (102) => 
-^eq{wi,W2) A ^n{^(J2, ifi)) 
(nodef » (toi ) Anodef|(TO2) => 
-^eg{wi,W2) A -in(tU2,UJi) A -in{wi. 



Table 3. Table 2 continued. 
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1. S\[wi->- u^] \= nodel^{w) 

2. S'^Xw^u^] \=nodel^{w) 

Proof. Because embeds into S, there exists an embedding function f, such that 
S*^ C-'^ S. For the sake of argument, assume that both claims hold. By Definition 8, we 
get that f{u^) = ux and f{u^) = U2; because f is a function, we get that Ui = U2. 
This yields a contradiction to the assumption that ui and U2 are distinct individuals. 

Lemma 2 For every 2-valued structure and assignment Z 

S\Z \= p''{vi,V2, ...,Vk) iffi^' {p){Z{vi), Z{V2), Z{vk)) E B 

Proof of the if direction: Suppose that t"^" Z{v2), . . . , Z{vk)) ^ B. There 

are two cases to consider: (i) B = 1/2 or (ii) r'"' (p)(Z(?;i), Z(u2), • • • , Z{vf.)) = 
B.\f B = 1/2, then by Definition 9, p^ivi,V2, ...,Vk) = 1 and thus S\ Z \= 
p^ivi,V2,...,Vk) for all Z. If B = 1, then iS\p){Z{vi), Z{v2), . . . , Z{vk)) = 1, 
thus S\Z \= p{vi,V2, ■ ■ ■ , Vk) which is S\Z j= p^{vi,V2, ■ ■ ■ , f/c) by Definition 9. 
Similarly, if B = 0, then t-^* {p){Z{vi), Z{v2), • • • , Z{vk)) = impUes that S'^,Z \= 

-^p{vi,V2, ■■■,Vk) =p"{vi,V2, ■ ■ ■ ,Vk). 

Proof of the only-if direction: Assume that S"^, Z \= p^{vi,V2, ■ ■ • , Vfe). If -B = 1/2, 
then i^" {p){Z{vi), Z{v2), • • • , Z{vk)) E B trivially holds. If B = 0, apply Defini- 
tion 9 to the assumption to get S\ Z \= ^p{vi, V2, ■ ■ ■ , Vk), which implies 

{p){Z{vi), Z{v2), . . . , Z{vk)) = = B. Similarly, if B = 1, the assumption im- 
plies i^* {p){Z(vi), Z{v2), Z(vk)) ^l^B. 
Lemma 3 Every bounded 3-valued structure S is FO-identifiable, where 

nodel^iw) = /\ p^^^P) 

Proof: Consider a bounded 3-valued structure S = {U, l^'}. We shall show that every 
element u £ ?7 is FO-identifiable using the formula defined in Eq. (4). Let be a 2- 
valued structure that embeds into S using a function /, and let be a concrete element 
in f/^*" . By Definition 8, we have to show that the following holds: 

f{u^) =u ^ S\[w^ w"] 1= nodef (w) 

Proof of the if direction: Suppose that S\[w 1-^ u^] \= nodef(ui). In particular, 
each conjunct of nodef must hold, i.e., for each predicate p G Vi, S^,[w ^ u^] \= 
p>- (p)(") (^ju^ Using Lemma 2 we get that C L^{p)(u). In addition, the em- 

bedding condition in Eq. (1), requires, in particular, that for each unary predicate p 
L^''{p)(u'^) C L^{p){f{u^)) holds. Let ui = f{u^). For the sake of argument, assume 
that ui u. Recall that S* is a bounded structure, in which every individual must have 
a unique combination of definite values of unary predicates. As a consequence, there 
must be a unary predicate p such that i^{p){u-i) i^{p){u) and the value of p on both 
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ui and u is definite. This yields a contradiction, because C on definite values implies 
equality; however = and l^*^ {p){u^) = b^{p){f{u^)) = b^{p){ui) 

can not hold simultaneously, by the assumption. 

Proof of the only-if direction: Suppose that f{u^) = u. Using Eq. (1), the embed- 
ding function / guarantees that for each unary predicate p, l^'^ {p){u^) Q {p){f{u^)). 
This means that 5^ [uj ^ u'^] h p'"" '•P^'-f'-'^''^Hw) by Lemma 2, or S\ [w ^ u^] [= 
pi- (p)(«) (y;^ by the assumption. This holds for all unary predicates, and thus holds for 
their conjunction as well, namely, for the formula nodef . 

Lemma D2 Given a set of formulas F and a 'S-valiied structure S, ifthe "focus" algo- 
rithm [26, Sec. 6] terminates, it returns a set of structures X such that ^{S) = j{X)and 
every formula (p € F evaluates, using the compositional semantics, to a definite value 
in every structure in X, for every assignment. Ifthe input structure S is FO-Identifiable, 
then all structures in X are FO-Identifiable. 

Proof: By induction on the iterations of the loop in the "focus" algorithm, it is suffi- 
cient to show that the structures returned by the procedure FocusAs s ignment from 
[26, Fig. 17] are FO-Identifiable. The only interesting case is when the input hteral of 
FocusAssignment is of the form p{u\, . . . , Uk). The resulting set of structures X 
is {Sq, Si, S"} where So and Si are copies of S with p{ui, .... v//,,) set to and 1, 
respectively. Thus, if S is FO-identifiable, then and are FO-identifiable. S" is a 
result of spUtting a node Ui € S into w.O and u.l, and setting . . . , Uk) to on 
one of the copies, and to 1 on the other. To simplify the exposition, suppose that the 
first node Ui is split. Then S" is FO-identifiable using the formulas nodef {w) for all u 
except w.O, u.l, and 

nodef o(w) = 3v2, • • • , Vk.^p{w, V2,...,Vk)A nodef (w) A Aj=2,...,fc "odef^ (vj) 
nodef.i(u;) = 3v2, • • • , Vk.p{w, V2,..., Vk) A nodef (w) A Aj=2,...,fe nodef^. {vj) 

Theorem 1 For every FO-identifiable ^-valued structure S, and 2-valued structure 

S^ e 7(^) iffS^ h 7(^) 

Proof: In Lemma D3, we show that the if-direction holds, even when S is not FO- 
identifiable, i.e., every concrete structure satisfying the characteristic formula 7(5*) is 
indeed in 7(6'). In Lemma D4 we show the only-if part, i.e., for an FO-identifiable 
structure, the other direction is also true. 

Lemma D3 Let Sbea first-order structure with set of individuals U = {ui,U2, . ■ . ,Un}. 
Let node^.{w) used in j{S) be an arbitrary first-order formula free in w, such that 
Lemma 1 holds. Then, for all S^ such that S'' \= 7(5'), 5^ S 7(«S'). 
Proof: Let = {U^, t'') be a concrete structure such that S'^ \= j{S). We shall con- 
struct a surjective function f : ^ U such that S^ S. Let Z'^ be an assignment 
over ui, . . . , u„ such that S^,Z''^ \= tp, where tp = AiLi ''^^^^ui (^i)' i-^-' f ^^e first 
line of Eq. (8) without the existential quantification. Note that all Z^{vi) are distinct, 
according to Lenama 1. Define the function f : ^ U hy: 

(u, if Z\v,) = u^ 

/(uti) = } Uj if foralH, Z^{vi) ^ and Uj is an arbitrary element such that (25) 
[ S\[w^u^]\=nod4.{w) 
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Let us show that every concrete element is mapped to some element in U. In the 

case that Z{vi) = u\ the concrete element is mapped to Ui e [/ by /. Otherwise, 
because j= [total] holds, at least one of its disjuncts must be satisfied by each u^, 
i.e. SK [w I— > u^] must satisfy nodef . (w) for some Uj-, thus f's definition wiU map 
to this Uj. Therefore, /(«'') is well-defined. 

In addition, every element Ui £ [/ is assigned by / to some concrete element uf G 
U'^ such that Z(yi) = u\. According to Lemma 1, all such elements u\ are different. 
Therefore, f{u^) is surjective. 

Let p be a nuUary predicate. Because satisfies £,nuiiary^ masl satisfy each con- 
junct, in particular S'^ \= p'^(p)0. Using Lemma 2 we get that t*^" C 

Let p e P be a predicate of arity r > 1. Let u\,u\, . . . ,u\ e and let us show 

that 

i^' mu\,u\, ...A)^ f{u\), f{4)) (26) 

Let Z be an assignment such that Z{wi) ~ for i = 1, . . . , r. Because S*^ |= £.^[p], we 
conclude that S''' , Z satisfies the body of Eq. (7). Consider the conjunct of the body with 
premise Aj=i ^^^^j^^^.-fi'^j)- definition of /, S^Wj i— > Uj satisfies node^^^j^ (wj) 

for all j = 1, . . . , r, which means that the premise is satisfied by S\ Z. Therefore, 
the conclusion must hold: S'^,Z \= p'''(p)(/(«5).-./("r))(u;i, • • • , Wr)) and the result 
follows from Lemma 2. 

Lemma D4 For every 3-valued FO-identifiable structure S, and 2-valued structure 
such that S^'^F and Q S, 

Proof: Let / : S''' ^ be a surjective function such that S. Let be an arbitrary 

element such that /(w-) = Ui. Define an assignment Z'' such that Z\vi) = m- must 
exist because / is surjective. Because S is FO-identifiable, by Definition 8 we conclude 
that for every I < i < n, S\Z^ ^ nodef.(t;i). Because / is a function, all « • are 
distinct elements, according to Lemma 1 . 

Because / is a function, for every u'' there is u such that /(u'') = u. Then, by 
Definition 8, S\ [w u^] \= nodef (w), i.e., every assignment to w in satisfies 
some disjunct of ^f^^^^. That is satisfies 

For every nullary predicate p € Vo, using Eq. (1) and Lemma 2, we conclude that 
S'^ satisfies j/ '('^'O. Therefore, satisfies 

Let p E P be a predicate of arity r. Let u\, . . . ,ul E U'^ and let Z^hean assignment 
such that Z\wi) — u-. We shall show that S\Z^ satisfy the body of Eq. (7). If the 
premise of the imphcation is not satisfied then the formula vacuously holds. Otherwise, 
S^, Z'^ \= nodef^(u;j) for alH = 1, . . . , r. Then, by Definition 8, /(mJ) = u,. Using 

Eq. (1) on /, we get l^^ . . . , itj.) C . . . , f{u\)), which means that 

i'^" . . . , uJ.) C L^{p){ux^ . . . ,Ur) holds. By Lemma 2, we conclude that S^, Z^ 

satisfies p'^(P)(»i'-'»'-)(u;i, . . .,Wr). 

Lemma 4 If 3-valued structure S = {U,l^) over vocabulary V is ICA then: 

(i) S is a bounded structure. 

(11) For each nullary predicate p, b^{p){) E {0,1}. 

(ill) For each element u E U, and each unary predicate p, L^{p){u) E {0,1}. 
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Proof: Let /S^ = {U^,l^ } be a 2- valued structure, such that S is the canonical abstrac- 
tion of SK Let canonical : ^ U bethe mapping that identifies S as the canonical 
abstraction of . 

(i) Show that 5 is a bounded structure. By Eq. (11), every abstract element represents 
concrete elements with the same canonical name. Thus, for two distinct abstract 
elements ?io,ui £ U^, the canonical name of concrete elements represented by 
Wo is different from the canonical name of concrete elements represented by ui. 
Without loss of generahty, assume that the canonical names differ in a unary predi- 
cate p, such that p evaluates to on all concrete elements represented by mo, and p 
evaluates to 1 on all concrete elements represented by ui. From the join operation 
in Eq. (12), it follows that the value of p on wq must be and the value of p on m 
must be 1. This shows that, in general, every pair of distinct elements in S differs 
in a definite value of some unary predicate, proving that 5 is a bounded structure. 

(ii) Let p be a nullary predicate. Show that L^{p)0 G {0, 1}. By Eq. (12), L^{p){) = 
U{l''^'' {p){)} = t'^'" {p){)- This means that p has the same value in Sand 5''. Because 

is a concrete structure, the value of p must be definite. 

(iii) Let p be a unary predicate and let u € U. Show that L^{p){u) € {0, 1}. Suppose 
that the opposite holds: i^{p){u) ~ 1/2. By Eq. (12), there exist two concrete ele- 
ments, denoted by mq and ui, such that canonical{uo) = u and canonical(uo) = 
u, and p evaluates to on uq and to 1 on ui. Hence, these concrete elements have 
different canonical names and by Eq. (11) they cannot be mapped by canonical to 
the same abstract element; this contradicts the supposition and hence {p) {u) G 
{0,1}. 

Lemma 5 Every 3-valued structure S that is an ICA is canonical-FO-identifiable, 
where 

nodel^{w)= f\p''^P^^'^'\w) (27) 

Proof: Let S = {U, i^} be a 3-valued structure that is ICA. We shall show that every 
element u e C/ is canonical-FO-identifiable using the formula defined in Eq. (15). Let 
= {(7'', t'^" } be a 2- valued structure, such that S is the canonical abstraction of S^, 
induced by a function canonical, and let S U^'^ . By Definition 16, we have to show 
that the following holds: 

canonical{u^) = u <^=^ S^,[w ^ vJ^] \= nodef (w) 

Proof of the if direction: Suppose that S^,[w v^] \= nodef (w;). Let ui = canonical {u^). 
For the sake of argument, assume that ui ^ u. S is an ICA and using Lemma 4(i) we 
get that 5 is a bounded structure. By Definition 10, there exists a unary predicate p that 
evaluates to different definite values on u and ui. Without loss of generality, suppose 
that p evaluates to on u and to 1 on mi. This implies the following two facts. First, 
from property Eq. (12) of the definition of canonical abstraction, p also evaluates to 1 
on all concrete values mapped to ui by canonical; in particular, p must evaluate to 1 
on u^. Second, recall that by assumption, each conjunct of node^ must hold, i.e., for 
each predicate p e Vi, S^,[w i— > u^] \= (^^^"^(w). Because p evaluates to on u. 



36 



we get from Definition 9 that S^,[w u'^] ^ (w), which means {p){u^) =Oand 
a contradiction is obtained. 

Proof of the only-if direction: Suppose that canonical {u'^) = u. Because 5 is an ICA 
by Lemma 4(iii) we know that all unary predicates have definite values in S. Let p 
be a unary predicate. Let B e {1,0} be such that L^{p){u) = B. Because p has 
definite value -B on u in S', by Eq. (12) it must have the same definite value B on 
all concrete nodes in S'^ that are mapped to u by canonical; in particular, on u^: 
L^^ (p) {u^ ) = B. Therefore, using Definition 9, S\[w t-^ u^] \= p^ (w), in other words, 
S*^, [w m''] 1= p'^ This holds for all unary predicates, and thus holds for 

their conjunction as well, i.e., for the formula nodef . 

Theorem 2 For every 3-valued structure S that is an ICA and 2-valued structure 

e 7c(^) iffS^ h 7c(^) 

Proof: In Lemma D5, we show that the if-direction holds, i.e., a 3-valued structure 
S is the canonical abstraction of every concrete structure satisfying the characteristic 
formula ^c{S); in Lemma D6 we show the other direction. 

Lemma D5 Let S beanlCAwithsetofindividualsU = {ui,U2, ■ ■ ■ ,Un}. Letnode^.{w) 
be an arbitrary formula free in w, used in such that Lemma 1 holds. Then, for all 
such that S"^ \= %{S), S is a canonical abstraction of S'^. 

Proof: Let = {U\i^) be a concrete structure such that \= jc{S). We shall con- 
struct a surjective function canonical : ^ U such that is a canonical abstraction 
of S. From Definition 17 it follows, in particular, that \= . Let Z'^ be an assign- 
ment over vi, . . . ,Vn such that 5*^, \= ip, where tp = Ar=i i^^'^^m ("^i)' 'P 
the first line of Eq. (8) without the existential quantification). Note that all Z\vi) are 
distinct, according to Lennma 1. Define the function canonical -.U^^U by: 

(u, ifZ\vi) = u^ 

canonical {v!^) = I uj if for all i, Z^{vi) ^ and Uj is an arbitrary element such that 

(28) 

Let us show that every concrete element is mapped to some element in U. In the 
case that Z{vi) = u\ the concrete element u'^ is mapped to Ui € U by canonical. 
Otherwise, because 5'' \= [total] holds, at least one of its disjuncts must be satisfied 
by each u\ i.e., S\ [w u^] must satisfy nodef .(?7;) for some up, thus canonical's 
definition will map to this Uj. Therefore, canonical{u^) is well-defined. 

In addition, every element Ui G U is assigned by canonical to some concrete ele- 
ment u\ G such that Z{vi) = w,-. According to Lemma 1, all such elements u^ are 
different. Therefore, canonical {u'^) is surjective. 

We shall show that canonical satisfies Eq. (11) and Eq. (12); that is, canonical 
identifies S as the canonical abstraction of 

First, let us show that Eq. (12) holds for the abstraction imposed by canonical, 
namely that a predicate p in 5 has the most precise abstract value w.r.t. the concrete 
values that it represents, as is imposed by canonical. 

Because S is an ICA, all nullary predicates in S must have definite values, by 
Lemma 4(ii). S''' satisfies ^nuiiary'^ therefore, by Definition 9, nullary predicates in S'^ 
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must have the same definite values as in S; this shows that Eq. (12) holds for nuUary 

predicates. 

Because S is an ICA, all unary predicates in S must have definite values, by Lemma 4(iii). 
Let p be a unary predicate and let w e C/ be an individual of 5 such that t^{p){u) = b. 
We shall show that p has the same definite value b on all concrete elements mapped to 
u by canonical. Because the join of these values is also b, we will get that Eq. (12) 
holds for p and u. Recall that 5'' satisfies formula ^^\p], hence each assignment to w 
satisfies the conjunct nodef (w;) [w) of S,^[p\. Let e t/'' be an individual of U"^ 
such that canonical (u^ ) = u and consider an assignment in which w is mapped to u^. 
By the definition of canonical, this assignment satisfies nodef the premise of the 
conjunct. Therefore, it satisfies the conclusion, i.e., S'^,\w ^ u''] satisfies p^{w). Using 
Definition 9 we get that l^' {p){u^) = b. 

Let p be a predicate of arity r > 1. If p has a definite value 6 in 5 on a tuple 
Ui, . . . ,Ur, £,^[p] requires that p evaluates to the same definite value b on every concrete 
tuple u\,...,u^ such that canonical (ul) = Ui (by the same argument as for unary 
predicates). Therefore, the join operation returns b as the most precise abstract value 
of p for these concrete tuples. Otherwise, if p evaluates to 1/2 on mi, . . . , G U, 
there must be two tuples of elements in U\ say Uq^, . . . , Uq^. and ujj^, . . . , u\j., such 
that S\ [wi 1-^ Uqi, . . . ,Wr Uq^] \= -'p{wi, . . . , Wr) and S\ [wi i-^ mJj^, . . . ,wi i-^ 
u\^] \= p{wi, . . . jWr), because |= T^\p]. Thus, p evaluates to on the first tuple 
and to 1 on the second tuple of the concrete structure; therefore, the most precise value 
obtained by the join operation on these values is 1/2. 

We shall show that canonical satisfies Eq. (11), i.e., it maps elements according to 
their canonical names. This involves showing two directions: 

1. For the sake of contradiction, assume that there are two distinct elements Uq,u\ G 

[/'' that have the same canonical name (meaning that for all p & Pi, {p){uq) = 
i^^ {p){v\)), but canonical {v!^q) ^ canonical {u\). Because 5 is a bounded struc- 
ture, there must be unary predicate p that evaluates to on canonical (uq) and to 1 
on canonical {u\). As shown above, p evaluates to the same definite values in the 
concrete structure 5": t^*' ip){uo) = 0, and t'^" {p){u\) = 1 and a contradiction is 
obtained. 

2. For the sake of contradiction, assume that two concrete elements, denoted by Uq , uj^ S 

U^, have different canonical names, but are mapped by canonical to the same same 
element in U: canonical{uQ) = canonical {u\), denoted by u. By definition of 
canoTOcai, 5'' Jw I— > u?l satisfies node^ . h>(w), for i = 0, 1, in other words 

\w I— » wj] satisfies nodef (u>). Therefore, it satisfies each conjunct of node for- 
mula, i.e., for all p, 5'', wJ] satisfies (w). From this and the fact that 
aU unary predicates in S have definite values because 5 is an ICA, we conclude 
by Definition 9, that t"^" = >^^{v)iv)- Therefore, l^^ = i.^{p){u) 
and L^*' {p){v\) = r^{p){u), for all p G Vi- Therefore, Uq and u\ have the same 
canonical name and a contradiction is obtained. 

Lemma D6 For every 3-valued structure S that is an ICA and 2-valued structure 
such that \= F, such that S is the canonical abstraction ofS\ \= t^. 
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Proof: Let canonical : U he the mapping that identifies S as the canonical 

abstraction of canonical is a surjective function and possesses the properties in 
Eq. (ll)andEq. (12). 

First, we show that H ? ■ ^^^t u\ be an arbitrary element such that canonical (uj ) = 
Ui. Define an assignment Z'l such that Z\vi) — uj; must exist because canonical 
is surjective. Because S is canonical-FO-identifiable, by Lemma 5 we conclude that for 
every \ < i < n, , Z'^ \= nodef . {vi). According to Lenrnia 1, all the are distinct 
elements. 

Because canonical is a function, for every there is a u such that canonical{u^) = 
u. Then, by Definition 16, S'^,[w ^ u\ ^ nodef (w), i.e., every assignment to wmS^ 
satisfies some disjunct of S^f^^^i- That is, S'^ satisfies ^fotai- 

Because S is an ICA, nullary predicates have the same definite values in S and in 
S^, by Lemma 4(ii). Therefore, by Definition 9, satisfies for every nullary 

predicate p € Vq, which means that S'^ satisfies ^nullary 

Let p € P be a predicate of arity r. Let u\, . . . G f/^ and let be an assign- 
ment such that Z'^iwi) = u^. We shall show that S\Z^ satisfies the body of Eq. (7). 
Consider a conjunct of the body. If the premise of the impUcation in this conjunct is not 
satisfied, then the conjunct vacuously holds. Otherwise, S^'^jZ^ \= nodef .(wj) for all 
i = 1, . . . , r. Then, by Lemma 5, canonical (ul) = Ui. We have two cases to consider: 
ii)ifL^{p){ui,...,Ur) = be {l,0}thenbyEq. (12) l'^\p){u\, . . . ,u^^) = 5, in other 
words, 5'', Z'l satisfies p^{wi, . . . , Wr). (ii) if t^{p){ui,. . . , Ur) = 1/2 then by Defi- 
nition 9, (^'^("^'•••'"'■^(wi, . . . , Wf.) = P^^'^i'Wi, ■ . . jWr) = 1, which holds for any 
assignment. 

To complete the proof, we show that for every p G Vr of arity r > 1, t^[p] holds. 
Let p be a predicate that evaluates to 1/2 on a tuple wi, . . . , Ur <E S. Because S is an 
ICA L^{p){ui, . . . , Ur) = 1/2 means that the join operation in Eq. (12) yields 1/2. 
By the definition of join as the least upper bound, and using the information order in 
Definition 4, we conclude that (i) must contain at least two distinct tuples; denoted 
by Uq]^, . . . , Uq^ and v\^, . . . , u\^. Because canonical {u^j) ~ uj for i = 0, 1 and j = 

1, . . . , r, by Lemma 5 we get that S\[w u-^] |= nodef^. {w). Therefore, each tuple 
satisfies Aj=i ^^^^uj (^j)- (ii) P evaluates to on the first tuple and 1 on the second 
tuple. This shows that S'^ \=t^\p\- 

Lemma 6 Denote by T? the set of all 2-valued structures that satisfy the integrity for- 
mula F: V = {S^ G 2-STRUCT[P\ \ |= F}. Let S be an ICA structure. There exists 
a set of ICA structures X such that 'ydX) ='D \ 'jdS). 

Proof: Denote by Y the set of all ICA structures over a fixed vocabulary V, i.e., 7c (^) = 
v. We claim that X is defined by F \ S*. By definition, jc{X) = ^c{Y \ S), and we 
show that jciY \ S) = jc{Y) \ idS). By the definitions of Y and 7c in Eq. (13), 
7c(y \ 5) 3 2? \ 7c (5) holds. To complete the proof, we show that the other direction 
of inclusion holds as well. For the sake of argument, assume that there exists a 2-valued 
structure that belongs to both 'ydS) and 7c(F \ <S'). Thus, by Definition 15, there 
exists an ICA structure S' such that canonical{S^) ~ S', and S' is different from S. 
From Eq. (12), it follows that canonical {S'^) ^ S, which contradicts the assumption 
that G ^c{S). 
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Lemma 7 Consider the formula from Eq. (17), for some ICA structure S. There 
exists a set of ICA structures X, such that the formula F A -ir^ is equivalent to the 
formula ^c{X). 

Proof: Let V be the set of all 2-valued structures that satisfy the integrity formula F. 
Let X be the set of ICA structures that describes the complement of ^'c(S), as given 
by Lemma 6. Let S"^ be a 2-valued structure such that G lc{X) if and only if 
eV\ 7c(S'). The right-hand side simplifies to S'' G I? and ^ lc{S). Applying 
Theorem 2, we get that S'^ \= ^c{X) if and only if S*^ satisfies F but does not satisfy 
7c (S'). Using Eq. (18), this is equivalent to 5*^ \= F A ^t^ . 
Theorem 3 For every 3-valued structure S, and a 2-valued structure S^: 

e 7(^) iffS^ H iNpiS) 

Proof: In Lemma D7, we show that the if-direction holds, i.e., every concrete structure 
satisfying the NP-characteristic formula 7jv p is indeed in 7(5) . In Lemma D8 we show 
the only-if part. 

Lemma D7 Let S be a logical structure with set of individuals U = {ui, U2, ■ ■ ■ ,Un}. 
Then, for all such that \= jNpiS), € j{S). 

Proof: Let = {U\ t^} be a concrete structure such that 5^ |= 7(5*). We shall con- 
struct a surjective function fiU^^U such that 5' C-' S. Let Z'^ be an assignment 
such that 5'', Z'' \= ip where is the body of without the existential quantifiers on 
sets. Let Z^{Vi) = Ui CU^. Consider the following definition: 

f{u^) = {ui I e Ui} (29) 

/(u'') is a set of size at most 1 because the pair satisfies the sub-formula 

^disjoint- ^his insures that the sets J7i, ...,[/„ are disjoint, i.e., each concrete element 
belongs to at most one set. For simplicity, we say that f{u^) = Ui, whenever f{u'') = 
{u,}. 

We shall show that every concrete element is mapped by / to some element in U. 
Because S''', Z'' satisfies ^f^tai^ conclude that every concrete element satisfies the 
formula nodef . (w) for some Ui. Also, nodef . (w) given in Definition 18 is a member- 
ship test in the set Vf, therefore, every concrete element must be a member of some set 
Ui. Thus, is mapped to Ui € U, by the definition of / in Eq. (29). This shows that / 
is well-defined. 

Because S\ Z^ satisfies ^ £,non emptyi^] for i = 1, • • • , f^, it must be that every U 
contains at least one element, say u^, that is mapped to Ui by /. Because the sets are 
disjoint, aU such elements u^ are different. Therefore, / is surjective. 

Let p be a nullary predicate. Because S*^ satisfies ^nuiiary^ satisfy each con- 

junct, in particular S*^ \= Using Lemma 2 we get that l^*' (p)() E i-^ {?){)■ 

Letp G P be a predicate of arity r > 1. Let u\,u\, . . . G and let us show 

that 

{p){ulul ...,ui)r LHp)ifiu\), f{ul), . . . , fi4)) (30) 

Let z\ be an extension of assignment Z^ such that z\{wi) — u\ for i — 1, . . . , r. 
Because S^,Z'^ ^ ^■^[p], we conclude that S^, z\ satisfies the body of Eq. (7). Con- 
sider the conjunct of the body with premise Aj=i iiode^^^i,^(wj). By definition of /, 
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S\wj i-> M j satisfies node^^^n ^ ( ) for all j = 1 , . . . , r, which means that the premise 
is satisfied by 5'' , z{ . Therefore, the conclusion must hold: 

S\Z^ |=p^^(p)(/("!).•••,/(«^))(^^;^^ . . . ^^i;^)) and the result follows from Lemma 2. 

Lemma D8 For every 3-valued structure S, and 2-valued structure such that \= 

F and □ S*, S*^ h C^- 

Proof: Let/: S*^ 5be a surjective function such that 5'' C-' 5. Define an assignment 
such that Z^{Vi) = C and U^ = {uj \ f{u^) = m}. 

Because / is a surjective function, there must exist at least one concrete element 
that is mapped to Ui by /. This element belongs to the set Ui. Therefore, S\ \= 

A" f?l 

Because / is a well-defined function, it maps each concrete element to exactly one 
element Ui G U, which induces the set J7i. Therefore, a concrete element cannot belong 
to more than one set; hence S\ Z^ ^ Afc^j ^diajointl^'j]- 

Because / is a function, / maps every concrete element to some element in U. 
Therefore, every concrete element belongs to some set, i.e., satisfies some disjunct of 

CL„,.Thatis5^ZhheLa/- 

For every nullary predicate p S Pq, using Eq. (1) and Lemma 2, we conclude that 

S\ Z^ satisfies p^'(p)(\ Therefore, S\Z^\= ^^^uary 

Let p e P be a predicate of arity r. Let , . . . , € and let z\ be an ex- 
tension of assignment Z'^ such that z\{wi) — u^-. We shall show that S\z'l satisfy 
the body of Eq. (7). If the premise of the implication is not satisfied, then the formula 
vacuously holds. Otherwise, S\z\ ^ nodef . (wi) for alH = 1, . . . , r. Then, by Def- 
inition 18, u- belongs to the set Ui. The definition of Ui implies that /(w-) = Ui. 
Using Eq. (1), we get l^^ {p){u\, . . . ,u^) C {p){f{u\), . . . , f{u^)) which means 
L^'^" {p){u\, . . . ,u^) □ {p){ui, . . . ,Ur). By Lemma 2 we conclude that S^, Z^ satis- 
fies 
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